High School Computer Science

Strings and Text Processing字符串与文本处理

A string is a sequence of characters — every piece of text your program touches is a string. This guide builds from the ground up: how strings are indexed and sliced in Python, which built-in methods (upper, lower, split, join, replace) do the heavy lifting, how concatenation and f-strings produce formatted output, how to search inside a string with in, find, and count, how to loop character by character, and how to combine these tools in a complete text-processing program. All code is Python; bilingual explanations throughout.字符串(string,字符串)是字符(character,字符)的序列——程序处理的每一段文本都是字符串。本指南从基础讲起:Python 中字符串的索引(index,索引)和切片(slicing,切片)、哪些内置方法(method,方法)承担重活(upperlowersplitjoinreplace)、如何用拼接(concatenation,拼接)和 f-string 生成格式化(formatting,格式化)输出、如何用 infindcount 在字符串中搜索、如何逐字符遍历,以及如何将这些工具组合成完整的文本处理(text processing,文本处理)程序。全部代码为 Python;全篇双语说明。

7 sections7 节内容 US CSTA · AP CSP · ON · BC · ABUS CSTA · AP CSP · ON · BC · AB Python code + worked examplesPython 代码 + 例题

How to use this guide如何使用本指南

Strings are present in every curriculum, but the depth of coverage varies. AP CSP Topic 3.4 directly assesses string operations via the exam reference sheet. Ontario ICS3C A1.2 is the most explicit string-manipulation expectation, naming character swapping, capitalisation, extracting substrings, and counting occurrences — all covered here. Ontario ICS3U A1.1 covers strings as a variable type. BC has no dedicated string standard but anchors to "manipulate numbers and text" in Computer Studies 10. Alberta CSE1110/CSE1120 name strings as a data type and concatenation as an operator. The table below shows which rows are core for you; each section cites the curriculum document it was checked against.字符串出现在所有课程大纲中,但覆盖深度各有不同。AP CSP 主题 3.4 通过考试参考手册直接评估字符串操作。安大略 ICS3C A1.2 是最明确的字符串操作期望,点名了字符交换、首字母大写、提取子字符串和计算出现次数——本指南均有涵盖。安大略 ICS3U A1.1 将字符串作为变量类型涵盖。BC 没有专门的字符串标准,但依附于 Computer Studies 10 中的"操作数字和文本"。阿尔伯塔 CSE1110/CSE1120 将字符串列为数据类型,将拼接列为运算符。下表显示哪些行对你是核心内容;每节均注明所依据的课纲文件。

If you are in…如果你在… Focus on these sections重点学习 Defer / lighter可推迟 / 减负 Source依据
🇺🇸 US CSTA / AP CSP美国 CSTA / AP CSP §1 through §7 in full. AP CSP Topic 3.4 (Strings) and Skill 4.B assess string operations directly on the exam. CSTA 3B-AP-12 names strings as a fundamental data structure.§1 至 §7 完整学习。AP CSP 主题 3.4(字符串)和技能 4.B 在考试中直接评估字符串操作。CSTA 3B-AP-12 将字符串列为基本数据结构。 The word-count worked example (§7) is enrichment; the core exam skill is reading and predicting string-expression results.词频统计例题(§7)为拓展内容;核心考试技能是阅读和预测字符串表达式的结果。 CSTA K-12 and AP CSP — CSTA 3B-AP-12; AP CSP Big Idea 3 Topic 3.4; Skill 4.B— CSTA 3B-AP-12;AP CSP 大概念 3 主题 3.4;技能 4.B
🇨🇦 ON Grade 11 — ICS3U / ICS3C安大略 11 年级 — ICS3U / ICS3C §1 through §7. ICS3U A1.1 requires strings as a variable type (§1). ICS3C A1.2 requires character-level manipulation, substring extraction, and occurrence counting (§3, §5, §7).§1 至 §7。ICS3U A1.1 要求字符串作为变量类型(§1)。ICS3C A1.2 要求字符级操作、子字符串提取和出现次数计数(§3、§5、§7)。 ICS3U university floor: §6 (looping over characters) and §7 (full word-count program) are enrichment; ICS3C college stream: all seven sections are assessed.ICS3U 大学预备基础:§6(遍历字符)和 §7(完整词频统计程序)为拓展;ICS3C 大学课程:全部七节均被评估。 ON/BC Computer Studies 11-12 — ICS3U A1.1; ICS3C A1.2— ICS3U A1.1;ICS3C A1.2
🇨🇦 BC — Computer Studies 10BC — Computer Studies 10 §1 through §4 as core (string basics, slicing, methods, formatting). §5–§7 (searching, looping, word-count) as enrichment.§1 至 §4 为核心(字符串基础、切片、方法、格式化)。§5–§7(搜索、遍历、词频统计)为拓展。 BC has no dedicated string-processing standard; all content here sits under the general "store and manipulate numbers and text" bullet.BC 没有专门的字符串处理标准;本指南所有内容均在通用"存储和操作数字和文本"条目下。 ON/BC Computer Studies 11-12 — BC CS10 "store and manipulate numbers and text"— BC CS10"存储和操作数字和文本"
🇨🇦 AB — CSE1110 / CSE1120阿尔伯塔 — CSE1110 / CSE1120 §1 through §4 as core. CSE1110 outcome 2.4.3 (strings as a data type) and 2.4.6 (concatenation operators) are the assessed anchors. §5–§7 map to CSE1120 outcome 2.6 (concatenation and interpolation operators).§1 至 §4 为核心。CSE1110 结果 2.4.3(字符串作为数据类型)和 2.4.6(拼接运算符)是被评估的依据。§5–§7 对应 CSE1120 结果 2.6(拼接和插值运算符)。 Alberta CSE has no standalone string-processing module; all content here sits within the Structured Programming 1/2 outcomes.阿尔伯塔 CSE 没有独立的字符串处理模块;本指南所有内容均在结构化编程 1/2 结果范围内。 Alberta CTS Computing Science — CSE1110 outcomes 2.4.3, 2.4.6; CSE1120 outcome 2.6— CSE1110 结果 2.4.3、2.4.6;CSE1120 结果 2.6
!
If you are cramming the night before如果你在临阵磨枪

Memorise: indexing is zero-based; slicing is s[start:stop] (stop excluded); the five key methods (upper, lower, split, join, replace); f-string syntax f"..."; and the in operator for membership testing. Read every cram-cheat box and skim the worked examples.背熟:索引从零开始;切片为 s[start:stop](stop 不含);五个关键方法(upperlowersplitjoinreplace);f-string 语法 f"...";以及用于成员测试的 in 运算符。阅读每个速记框并浏览例题。

*
If you are going for the top mark如果你目标顶分

Work through all seven sections and the worked example in §7. AP CSP Skill 4.B asks you to trace string expressions step by step — practice reading s[1:4], s.split(","), and " ".join(words) without running the code. ON ICS3C A1.2 requires you to write programs that extract substrings and count occurrences from scratch.完整学习全部七节及 §7 的例题。AP CSP 技能 4.B 要求你逐步追踪字符串表达式——练习在不运行代码的情况下阅读 s[1:4]s.split(",")" ".join(words)。ON ICS3C A1.2 要求你从头编写提取子字符串和计算出现次数的程序。


String Basics and Indexing字符串基础与索引

Three facts to lock in before anything else.先记住这三点,再学其他内容。
  • A string is a sequence of characters.字符串是字符的序列。 Every character — letter, digit, space, punctuation — is stored at a numbered position called an index. Indices start at 0, not 1.每个字符——字母、数字、空格、标点——都存储在一个称为索引的编号位置。索引从 0 开始,不是 1。
  • Read one character with bracket notation.用方括号表示法读取单个字符。 s[0] is the first character; s[-1] is the last character; s[-2] is the second-to-last.s[0] 是第一个字符;s[-1] 是最后一个字符;s[-2] 是倒数第二个字符。
  • Strings are immutable.字符串是不可变的。 You cannot change a character in place (s[0] = "X" raises a TypeError). Every "modification" creates a new string object.你无法就地修改字符(s[0] = "X" 会引发 TypeError)。每次"修改"都会创建一个新的字符串对象。
Curriculum anchor: AB CSE1110 outcome 2.4.3 says "use appropriate data types such as … strings"; ON ICS3U A1.1 requires "use … strings … correctly in computer programs." Both assessments expect you to declare a string variable and access its characters.课程依据:AB CSE1110 结果 2.4.3 要求"使用适当的数据类型,如……字符串";ON ICS3U A1.1 要求"在计算机程序中正确使用……字符串……"。两项评估都要求你声明字符串变量并访问其字符。
WE
Worked Example 1 · Indexing into a string例题 1 · 字符串索引

Predict the output of each line without running the code.不运行代码,预测每行的输出。

name = "Dingrui"
print(name[0])    # D
print(name[3])    # g
print(name[-1])   # i
print(name[-3])   # u
print(len(name))  # 7

以上代码:name[0] 取第 1 个字符,name[-1] 取最后一个字符,len() 返回字符串长度。Index map: D=0, i=1, n=2, g=3, r=4, u=5, i=6. Negative indices count from the right: -1=i, -2=u, -3=r... wait — "Dingrui" ends in r,u,i so -1=i, -2=u, -3=r. len("Dingrui") = 7.

Given s = "Python", what is s[2]?给定 s = "Python"s[2] 的值是什么?
§1 · Q1
"P""P"
"y""y"
"t""t"
"h""h"
Indices: P=0, y=1, t=2, h=3, o=4, n=5. So s[2] is "t".索引:P=0, y=1, t=2, h=3, o=4, n=5。因此 s[2] 为 "t"。
Indexing is zero-based: index 0 = "P", 1 = "y", 2 = "t". The answer is "t".索引从零开始:索引 0="P",1="y",2="t"。答案是 "t"。
Given s = "Hello", what is s[-1]?给定 s = "Hello"s[-1] 的值是什么?
§1 · Q2
"H""H"
"e""e"
"l""l"
"o""o"
Negative index -1 always refers to the last character. "Hello" ends in "o", so s[-1] = "o".负索引 -1 始终指向最后一个字符。"Hello" 以 "o" 结尾,因此 s[-1] = "o"。
s[-1] = last character = "o". s[0] = "H". Negative indices count from the right: -1 is last, -2 is second-to-last.s[-1] = 最后一个字符 = "o"。s[0] = "H"。负索引从右计数:-1 是最后,-2 是倒数第二。

String Slicing字符串切片

Slice syntax: s[start:stop] — includes start, excludes stop.切片语法:s[start:stop]——包含 start,不含 stop
  • s[1:4] — characters at indices 1, 2, 3 (NOT 4).— 索引 1、2、3 处的字符(不含 4)。
  • s[:3] — omit start → defaults to 0. Gives first 3 characters.— 省略 start 则默认为 0。返回前 3 个字符。
  • s[2:] — omit stop → goes to end. Gives everything from index 2 onward.— 省略 stop 则到末尾。返回从索引 2 开始的全部字符。
  • s[::2] — step of 2 → every other character. s[::-1] reverses the string.— 步长为 2 → 每隔一个字符取一个。s[::-1] 反转字符串。
The one fact that trips people: the stop index is excluded. If you want characters at positions 0, 1, 2 you write s[0:3], not s[0:2]. ON ICS3C A1.2 names "extract a portion of an address" — that is a slice.最容易出错的地方:stop 索引是不包含的。如果你想要位置 0、1、2 处的字符,你写 s[0:3],不是 s[0:2]。ON ICS3C A1.2 点名"提取地址的一部分"——那就是切片。
WE
Worked Example 2 · Slicing a date string例题 2 · 切片日期字符串

Given date = "2024-06-15", extract the year, month, and day as separate strings.给定 date = "2024-06-15",将年、月、日提取为单独的字符串。

date = "2024-06-15"
year  = date[0:4]   # "2024"
month = date[5:7]   # "06"
day   = date[8:10]  # "15"
print(year, month, day)  # 2024 06 15

以上:date[0:4] 取索引 0~3(共 4 字符),date[5:7] 取索引 5~6,date[8:10] 取索引 8~9。这是 ON ICS3C A1.2"提取地址的一部分"的典型例子。Index positions: 2=0, 0=1, 2=2, 4=3, -=4, 0=5, 6=6, -=7, 1=8, 5=9. date[0:4] captures indices 0–3 = "2024". date[5:7] = indices 5–6 = "06". date[8:10] = indices 8–9 = "15". This is the canonical ON ICS3C A1.2 "extract a portion" operation.

Given s = "computer", what does s[2:5] return?给定 s = "computer"s[2:5] 返回什么?
§2 · Q1
"com""com"
"mpu""mpu"
"mput""mput"
"put""put"
"computer": c=0,o=1,m=2,p=3,u=4,t=5,e=6,r=7. s[2:5] = indices 2,3,4 = "mpu". Stop index 5 is excluded."computer":c=0,o=1,m=2,p=3,u=4,t=5,e=6,r=7。s[2:5] = 索引 2,3,4 = "mpu"。stop 索引 5 不包含。
Slice [2:5] gives indices 2, 3, 4 — the stop (5) is excluded. "computer"[2]="m", [3]="p", [4]="u" → "mpu".切片 [2:5] 给出索引 2、3、4——stop(5)不包含。"computer"[2]="m",[3]="p",[4]="u" → "mpu"。
What does "Hello"[::-1] return?"Hello"[::-1] 返回什么?
§2 · Q2
"olleH""olleH"
"Hello""Hello"
"H""H"
"lleH""lleH"
A step of -1 reverses the string. "Hello"[::-1] reads every character from right to left: "olleH".步长 -1 会反转字符串。"Hello"[::-1] 从右到左读取每个字符:"olleH"。
[::-1] = reverse. Start and stop are omitted (whole string), step is -1 (backwards). Result: "olleH".[::-1] = 反转。start 和 stop 省略(整个字符串),step 为 -1(反向)。结果:"olleH"。

String Methods字符串方法

Five methods to memorise — they cover 90 % of real-world string work.背熟五个方法——它们覆盖 90% 的实际字符串工作。
  • s.upper() — returns a new string with all characters uppercased. "hello".upper()"HELLO".— 返回所有字符大写的新字符串。"hello".upper()"HELLO"
  • s.lower() — returns all lowercase. Useful for case-insensitive comparison.— 返回全小写。用于不区分大小写的比较。
  • s.split(sep) — splits on sep and returns a list. "a,b,c".split(",")["a","b","c"]. If sep is omitted, splits on any whitespace.— 按 sep 分割并返回列表。"a,b,c".split(",")["a","b","c"]。省略 sep 则按任意空白分割。
  • sep.join(iterable) — the inverse of split. "-".join(["a","b","c"])"a-b-c".— split 的逆操作。"-".join(["a","b","c"])"a-b-c"
  • s.replace(old, new) — replaces every occurrence of old with new. "cats and cats".replace("cats","dogs")"dogs and dogs".— 将每个 old 替换为 new"cats and cats".replace("cats","dogs")"dogs and dogs"
Curriculum anchor: ON ICS3C A1.2 (verbatim): "capitalize first letter" → upper()/lower(); "extract a portion" → slice + split; "count occurrences" → count() in §5. AB CSE1120 outcome 2.6 names "concatenation and interpolation operators" — join() is the idiomatic Python concatenation tool.课程依据:ON ICS3C A1.2(原文):"将首字母大写" → upper()/lower();"提取一部分" → 切片 + split;"计算出现次数" → count()(见 §5)。AB CSE1120 结果 2.6 点名"拼接和插值运算符"——join() 是 Python 的惯用拼接工具。
WE
Worked Example 3 · Normalising a CSV line例题 3 · 规范化 CSV 行

Given row = " Alice , Biology , 92 ", produce a clean list of stripped, lowercased fields.给定 row = " Alice , Biology , 92 ",生成去空格、小写的字段列表。

row = "  Alice , Biology , 92 "
fields = row.split(",")          # ["  Alice ", " Biology ", " 92 "]
clean  = [f.strip().lower() for f in fields]
print(clean)                     # ["alice", "biology", "92"]

以上:split(",") 按逗号分割(方法,method),strip() 去除两端空白,lower() 转为小写。这三个步骤可以链式调用。Chain: split(",") splits on comma (returns list), then strip() removes surrounding whitespace, then lower() normalises case. This pattern — split, strip, lower — is the standard CSV-cleaning idiom.

What does "Hello World".lower() return?"Hello World".lower() 返回什么?
§3 · Q1
"HELLO WORLD""HELLO WORLD"
"Hello world""Hello world"
"hello world""hello world"
"Hello World""Hello World"
lower() converts every character to lowercase, returning a new string. Spaces are unchanged.lower() 将每个字符转换为小写,返回新字符串。空格不变。
lower() lowercases everything. upper() would give "HELLO WORLD". The original string is unchanged because strings are immutable.lower() 将所有字母转为小写。upper() 会得到 "HELLO WORLD"。原字符串不变,因为字符串是不可变的。
What does ",".join(["a","b","c"]) return?",".join(["a","b","c"]) 返回什么?
§3 · Q2
["a","b","c"]["a","b","c"]
"a b c""a b c"
"abc""abc"
"a,b,c""a,b,c"
join() concatenates the list elements with the separator "," between each pair. Result: "a,b,c".join() 将列表元素用分隔符 "," 连接在一起。结果:"a,b,c"
sep.join(list) produces a string, not a list. The separator goes between each element: "a" + "," + "b" + "," + "c" = "a,b,c".sep.join(list) 产生字符串,不是列表。分隔符位于每个元素之间:"a" + "," + "b" + "," + "c" = "a,b,c"。

Concatenation and String Formatting字符串拼接与格式化

Three ways to combine strings — know all three.三种合并字符串的方式 — 三种都要掌握。
  • + operator (concatenation)+ 运算符(拼接) "Hello" + " " + "World""Hello World". Only works with strings; use str(n) to convert numbers first."Hello" + " " + "World""Hello World"。只能用于字符串;先用 str(n) 将数字转换。
  • f-string (formatted string literal)f-string(格式化字符串字面量) f"Score: {score}". The expression inside {} is evaluated and inserted. Readable and preferred in modern Python.f"Score: {score}"{} 内的表达式被求值并插入。现代 Python 中首选,可读性强。
  • str.format()str.format() "Score: {}".format(score). Older style, still common in legacy code."Score: {}".format(score)。旧式写法,在旧代码中仍常见。
Curriculum anchor: AB CSE1110 outcome 2.4.6 (verbatim): "use assignment, arithmetical and concatenation and interpolation operators." CSE1120 outcome 2.6 repeats concatenation. f-strings are the Python interpolation idiom those outcomes anticipate.课程依据:AB CSE1110 结果 2.4.6(原文):"使用赋值、算术和拼接及插值运算符。" CSE1120 结果 2.6 再次提及拼接。f-string 是这些结果所预期的 Python 插值惯例。
WE
Worked Example 4 · Building a grade report line例题 4 · 生成成绩报告行

Given a student name, subject, and score, produce a formatted report line three different ways.给定学生姓名、科目和分数,用三种不同方式生成格式化的报告行。

name    = "Alice"
subject = "Biology"
score   = 92

# Method 1: + concatenation
line1 = name + " | " + subject + " | " + str(score)

# Method 2: f-string
line2 = f"{name} | {subject} | {score}"

# Method 3: .format()
line3 = "{} | {} | {}".format(name, subject, score)

print(line1)  # Alice | Biology | 92
print(line2)  # Alice | Biology | 92
print(line3)  # Alice | Biology | 92

以上三种方式输出相同。+ 需要 str(score) 转换;f-string(格式化字符串)最简洁,现代 Python 推荐。注意字符串是不可变的(immutable),每次操作都返回新字符串。All three produce the same output. The + method requires explicit str(score) conversion; f-strings handle it automatically and are the modern Python recommendation. Note: every operation returns a new string — strings are immutable.

What does "Score: " + str(85) return?"Score: " + str(85) 返回什么?
§4 · Q1
8585
"Score: 85""Score: 85"
TypeErrorTypeError
"Score: " + 85"Score: " + 85
str(85) converts the integer 85 to the string "85", then + concatenates it with "Score: " to give "Score: 85".str(85) 将整数 85 转换为字符串 "85",然后 + 将其与 "Score: " 拼接,得到 "Score: 85"。
Without str() the + between a string and int raises TypeError. With str(85) the conversion succeeds and produces "Score: 85".没有 str(),字符串和整数之间的 + 会引发 TypeError。有 str(85) 则转换成功,产生 "Score: 85"。
Given x = 7, what does f"Value is {x * 2}" return?给定 x = 7f"Value is {x * 2}" 返回什么?
§4 · Q2
"Value is 14""Value is 14"
"Value is x * 2""Value is x * 2"
"Value is {x * 2}""Value is {x * 2}"
1414
In an f-string, the expression inside {} is evaluated at runtime. x * 2 = 7 * 2 = 14, so the result is the string "Value is 14".在 f-string 中,{} 内的表达式在运行时被求值。x * 2 = 7 * 2 = 14,因此结果是字符串 "Value is 14"
f-strings evaluate the expression in {} — they do not insert the literal text. x * 2 evaluates to 14, giving "Value is 14".f-string 对 {} 中的表达式求值——不是插入字面文本。x * 2 求值为 14,得到 "Value is 14"。

Searching Within Strings在字符串中搜索

Three search tools — know which returns what.三种搜索工具 — 知道每种返回什么。
  • sub in s — returns True or False. Fastest way to check membership. "cat" in "concatenate"True.— 返回 TrueFalse。检查成员关系的最快方式。"cat" in "concatenate"True
  • s.find(sub) — returns the index of the first occurrence, or -1 if not found. "hello".find("ll")2.— 返回第一次出现的索引,未找到则返回 -1"hello".find("ll")2
  • s.count(sub) — returns the number of non-overlapping occurrences. "banana".count("a")3.— 返回不重叠出现次数。"banana".count("a")3
Curriculum anchor: ON ICS3C A1.2 (verbatim): "count the occurrences of a word or letter" — that is s.count(). AP CSP Topic 3.4 and Skill 4.B assess reading and predicting string-operation results — all three tools above appear on the exam reference sheet as string procedures.课程依据:ON ICS3C A1.2(原文):"计算单词或字母的出现次数"——那就是 s.count()。AP CSP 主题 3.4 和技能 4.B 评估阅读和预测字符串操作结果——以上三种工具都出现在考试参考手册的字符串程序中。
WE
Worked Example 5 · Analysing a sentence例题 5 · 分析一个句子

Use all three search tools on the sentence "the quick brown fox".在句子 "the quick brown fox" 上使用全部三种搜索工具。

sentence = "the quick brown fox"

# in — membership test
print("fox" in sentence)        # True
print("cat" in sentence)        # False

# find — first index (-1 if absent)
print(sentence.find("quick"))   # 4
print(sentence.find("cat"))     # -1

# count — occurrences
print(sentence.count("o"))      # 2  (brown, fox)
print(sentence.count("the"))    # 1

以上:in 返回布尔值(True/False),find() 返回第一个匹配的索引(方法,method),count() 计算不重叠的出现次数。"brown" 中的 o 和 "fox" 中的 o 各算一次,共 2 次。in returns bool. find() returns the start index of the first match (or -1). count() counts non-overlapping occurrences: "o" appears in "brown" (index 10) and "fox" (index 16) — total 2.

What does "banana".count("an") return?"banana".count("an") 返回什么?
§5 · Q1
3
1
0
2
"banana": b-a-n-a-n-a. "an" appears at index 1 (a,n) and index 3 (a,n) — two non-overlapping occurrences. Count = 2."banana":b-a-n-a-n-a。"an" 出现在索引 1(a,n)和索引 3(a,n)——两个不重叠的出现。Count = 2。
"an" appears twice in "banana": at positions 1 and 3. count() counts non-overlapping matches, so the answer is 2."an" 在 "banana" 中出现两次:位置 1 和 3。count() 计算不重叠匹配,因此答案是 2。
What does "hello".find("x") return?"hello".find("x") 返回什么?
§5 · Q2
-1
0
False
None
find() returns -1 when the substring is not found. "x" is not in "hello", so the result is -1.find() 在未找到子字符串时返回 -1。"x" 不在 "hello" 中,因此结果是 -1。
find() returns an integer: the index if found, or -1 if not found. It never returns False or None. Use in for a boolean membership test.find() 返回整数:找到则返回索引,未找到则返回 -1。它不会返回 False 或 None。用 in 进行布尔成员测试。

Looping Over Characters遍历字符

Two loop patterns for strings.字符串的两种循环模式。
  • For-each (direct)for-each(直接遍历)
    for ch in s: ch takes each character in turn. Cleaner when you only need the character, not its index.ch 依次取每个字符。当只需要字符而不需要索引时更简洁。
  • For-range (index access)for-range(索引访问)
    for i in range(len(s)): — use s[i] inside the loop. Needed when you must know the position.— 在循环内用 s[i]。当必须知道位置时使用。
Common tasks using character loops: count vowels, count uppercase letters, check if all characters are digits, reverse a string manually. ON ICS3C A1.2 "swap two characters" requires index-based access.字符循环的常见任务:计算元音数、计算大写字母数、检查所有字符是否为数字、手动反转字符串。ON ICS3C A1.2"交换两个字符"需要基于索引的访问。
WE
Worked Example 6 · Counting vowels in a word例题 6 · 统计单词中的元音字母

Count how many vowels are in the word "education" using both loop styles.用两种循环方式统计单词 "education" 中的元音字母数量。

word   = "education"
vowels = "aeiou"

# Style 1: for-each (cleaner)
count1 = 0
for ch in word:
    if ch in vowels:
        count1 += 1
print(count1)   # 5

# Style 2: for-range (index-based)
count2 = 0
for i in range(len(word)):
    if word[i] in vowels:
        count2 += 1
print(count2)   # 5

以上:Style 1 用 for ch in word 逐字符(character)遍历;Style 2 用索引(index)访问。"education" 中的元音:e, u, a, i, o,共 5 个。两种方式结果相同。Both styles produce the same result. "education": e(vowel), d, u(vowel), c, a(vowel), t, i(vowel), o(vowel), n — 5 vowels. Style 1 is more Pythonic; Style 2 is needed when the position matters (e.g., to replace word[i]).

What does the following code print?
s = "hello"
result = ""
for ch in s:
    result = ch + result
print(result)
以下代码打印什么?
s = "hello"
result = ""
for ch in s:
    result = ch + result
print(result)
§6 · Q1
"hello""hello"
"helo""helo"
"olleh""olleh"
""""
Each character is prepended (placed before) result. Trace: ""→"h"→"eh"→"leh"→"lleh"→"olleh". The loop reverses the string.每个字符被前置(放在)result 之前。追踪:""→"h"→"eh"→"leh"→"lleh"→"olleh"。该循环反转了字符串。
Each iteration does result = ch + result, which places the new character before the accumulated string. After all 5 characters, the string is reversed: "olleh".每次迭代执行 result = ch + result,将新字符放在累积字符串之前。经过所有 5 个字符后,字符串被反转:"olleh"。
Which loop style is best when you need to know the position of the character you are processing?当你需要知道正在处理的字符的位置时,哪种循环方式最合适?
§6 · Q2
for ch in s:for ch in s:
for i in range(len(s)):for i in range(len(s)):
while s:while s:
for s in range(10):for s in range(10):
for i in range(len(s)) gives both the index i and the character s[i], so you know where each character is. Use this when position matters (e.g., swapping characters, building a new string with changes at specific positions).for i in range(len(s)) 同时给出索引 i 和字符 s[i],因此你知道每个字符的位置。当位置重要时使用此方式(例如,交换字符、在特定位置构建带更改的新字符串)。
for ch in s gives the character but not its index. When you need the position, use for i in range(len(s)) and access the character as s[i].for ch in s 给出字符但不给出其索引。当你需要位置时,使用 for i in range(len(s)) 并以 s[i] 访问字符。

Text-Processing Worked Example: Word Count文本处理综合例题:词频统计

Word-count in three steps — the canonical text-processing pipeline.三步词频统计 — 经典文本处理流水线。
  • Step 1 — Normalise第 1 步——规范化 lower() + strip() so "The" and "the" count as the same word.lower() + strip() 使"The"和"the"被计为同一个词。
  • Step 2 — Split第 2 步——分割 split() with no argument splits on any whitespace, collapsing multiple spaces.— 无参数的 split() 按任意空白分割,合并多个空格。
  • Step 3 — Count第 3 步——计数 — loop over the word list and count occurrences, or use a dictionary for frequency per word.— 遍历词列表并计数,或用字典统计每个词的频率。
Curriculum anchor: ON ICS3C A1.2 (verbatim) names "count the occurrences of a word or letter" — exactly this example. This pipeline combines all six earlier sections: indexing/slicing (§1–2), methods (§3), formatting (§4), searching (§5), and loops (§6).课程依据:ON ICS3C A1.2(原文)点名"计算单词或字母的出现次数"——正是本例。这个流水线结合了前六节的所有内容:索引/切片(§1–2)、方法(§3)、格式化(§4)、搜索(§5)和循环(§6)。
WE
Worked Example 7 · Word-count program例题 7 · 词频统计程序

Given a sentence, count the total number of words, find the most frequent word, and report whether a target word appears.给定一个句子,统计总词数,找出出现最多的词,并报告目标词是否出现。

sentence = "the cat sat on the mat and the cat"

# Step 1: Normalise
text  = sentence.lower().strip()

# Step 2: Split into words
words = text.split()           # ["the","cat","sat","on","the","mat","and","the","cat"]

# Step 3: Count total words
total = len(words)
print(f"Total words: {total}") # Total words: 9

# Step 4: Frequency of each unique word
freq = {}
for w in words:
    if w in freq:
        freq[w] += 1
    else:
        freq[w] = 1
print(freq)
# {'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1, 'and': 1}

# Step 5: Most frequent word
top = max(freq, key=freq.get)
print(f"Most frequent: '{top}' ({freq[top]} times)")
# Most frequent: 'the' (3 times)

# Step 6: Target search
target = "cat"
print(f"'{target}' appears: {'yes' if target in words else 'no'}")

以上流水线:lower() 规范化,split() 分割为词列表,len() 计总词数,字典统计词频(频率),max() 找最高频词,in 检查目标词是否存在。这综合了本单元所有字符串技能。Pipeline: lower() normalises case, split() tokenises (splits into word list), len() counts total words, the dict loop counts frequency per word, max() finds the top word, and in tests membership. This integrates all six earlier sections.

Given text = "To be or not to be", what does text.lower().split() return?给定 text = "To be or not to be"text.lower().split() 返回什么?
§7 · Q1
["to","be","or","not","to","be"]["to","be","or","not","to","be"]
["To","be","or","not","to","be"]["To","be","or","not","to","be"]
"to be or not to be""to be or not to be"
66
lower() first converts to all lowercase → "to be or not to be". Then split() splits on whitespace into a list: ["to","be","or","not","to","be"].lower() 先转为全小写 → "to be or not to be"。再 split() 按空白分割成列表:["to","be","or","not","to","be"]。
The chain .lower().split() applies both operations in sequence. Result is a list of lowercase words.链式调用 .lower().split() 按顺序应用两个操作。结果是小写词的列表。
After running the word-count program on "the cat sat on the mat and the cat", what is the total number of words?"the cat sat on the mat and the cat" 运行词频统计程序后,总词数是多少?
§7 · Q2
6
8
9
7
Split on spaces: "the","cat","sat","on","the","mat","and","the","cat" — 9 tokens. len(words) = 9.按空格分割:"the","cat","sat","on","the","mat","and","the","cat"——9 个词元。len(words) = 9。
Count the space-separated tokens: the(1) cat(2) sat(3) on(4) the(5) mat(6) and(7) the(8) cat(9). Total = 9.数空格分隔的词元:the(1) cat(2) sat(3) on(4) the(5) mat(6) and(7) the(8) cat(9)。总计 = 9。
Going deeper — removing punctuation before word-count Honors — ICS3C / CSE1120深入 — 计数前去除标点 荣誉 — ICS3C / CSE1120

Real text contains commas, periods, and quotation marks, so "word," and "word" count as different tokens. The standard fix is to strip punctuation from each token before inserting it into the frequency dictionary. One approach: word.strip(".,!?;:\""). A more general solution uses the str.translate() method with str.maketrans() to remove all punctuation at once. ON ICS3C A1.2 names "extract a portion" and "count occurrences" as the assessed skills; handling punctuation cleanly is the distinction between a basic and a polished solution.真实文本包含逗号、句号和引号,因此 "word,""word" 会被计为不同的词元。标准修复是在将每个词元插入频率字典之前去除标点。一种方法:word.strip(".,!?;:\"")。更通用的解决方案使用 str.translate() 方法配合 str.maketrans() 一次去除所有标点。ON ICS3C A1.2 将"提取一部分"和"计算出现次数"列为评估技能;干净地处理标点是基础解法和精良解法之间的区别。


Exam Strategy and Common Pitfalls考试策略与常见陷阱

Index and slice questions索引和切片题
  • Zero-based indexing.从零开始的索引。 The first character is always index 0, not 1. Every year, students lose marks by writing s[1] for the first character. Negative indices count from the right: s[-1] is always the last character.第一个字符始终是索引 0,不是 1。每年都有学生因写 s[1] 作为第一个字符而失分。负索引从右计数:s[-1] 始终是最后一个字符。
  • Stop is excluded in slices.切片中的 stop 不包含。 s[0:3] gives three characters (indices 0, 1, 2), NOT four. To get the first n characters, write s[:n].s[0:3] 给出三个字符(索引 0、1、2),不是四个。要获取前 n 个字符,写 s[:n]
Method questions (§3–§5)方法题(§3–§5)
  • Strings are immutable.字符串是不可变的。 s.upper() returns a new string — it does NOT change s. Always assign the result: s = s.upper(). Forgetting this is the most common method-question mistake.s.upper() 返回新字符串——它不改变 s。必须赋值:s = s.upper()。忘记这点是方法题最常见的错误。
  • find() returns -1, not False.find() 返回 -1,不是 False Use in for a boolean test; use find() when you need the position. On AP CSP exam: the reference-sheet string operations use 1-based indexing — adjust by one relative to Python.布尔测试用 in;需要位置时用 find()。在 AP CSP 考试中:参考手册字符串操作使用从 1 开始的索引——相对于 Python 调整一位。
String expression tracing (AP CSP Skill 4.B)字符串表达式追踪(AP CSP 技能 4.B)
  • Evaluate inside-out.由内而外求值。 For a chain like "hello".upper().replace("L","X"): evaluate left to right. Step 1: "HELLO". Step 2: "HEXXO". Show each intermediate value for full marks.对于链式调用如 "hello".upper().replace("L","X"):从左到右求值。第 1 步:"HELLO"。第 2 步:"HEXXO"。展示每个中间值以获得满分。
  • Count characters, not words.计字符,不是词。 When predicting s.count("x"), count every occurrence of the literal character, not the number of words. Overlapping substrings are NOT counted: "aaa".count("aa") = 1, not 2.预测 s.count("x") 时,计算字面字符的每次出现,不是词数。重叠子字符串不计:"aaa".count("aa") = 1,不是 2。
Formatting and concatenation pitfalls (§4)格式化和拼接陷阱(§4)
  • Convert numbers before + concatenation.+ 拼接前转换数字。 "Score: " + 92 raises a TypeError. Use str(92) or an f-string instead: f"Score: {92}"."Score: " + 92 会引发 TypeError。改用 str(92) 或 f-string:f"Score: {92}"

Flashcards闪卡

0 / 14 flipped0 / 14 已翻
What is a string?什么是字符串?
A sequence of characters. Immutable in Python — every "change" returns a new string object.字符的序列。Python 中不可变——每次"修改"都返回新字符串对象。
s[0] vs s[-1]?s[0]s[-1] 分别是什么?
s[0] = first character. s[-1] = last character. Indices start at 0; negative indices count from the right.s[0] = 第一个字符。s[-1] = 最后一个字符。索引从 0 开始;负索引从右计数。
Slice syntax rule切片语法规则
s[start:stop] — includes start, excludes stop. s[:3] = first 3. s[2:] = from index 2 to end. s[::-1] = reversed.s[start:stop]——包含 start,不含 stops[:3] = 前 3 个。s[2:] = 索引 2 到末尾。s[::-1] = 反转。
s.upper() / s.lower()s.upper() / s.lower()
Return new strings with all letters uppercased / lowercased. The original string is unchanged (immutable). Assign the result to use it: s = s.lower().返回所有字母大写/小写的新字符串。原字符串不变(不可变)。需赋值才能使用:s = s.lower()
s.split(sep)s.split(sep)
Splits the string on sep, returns a list. No argument = split on any whitespace. "a,b".split(",") → ["a","b"].sep 处分割字符串,返回列表。无参数 = 按任意空白分割。"a,b".split(",") → ["a","b"]
sep.join(iterable)sep.join(iterable)
Inverse of split. Joins list elements into one string with sep between each pair. "-".join(["a","b","c"]) → "a-b-c".split 的逆操作。将列表元素用 sep 连接成一个字符串。"-".join(["a","b","c"]) → "a-b-c"
s.replace(old, new)s.replace(old, new)
Replaces every occurrence of old with new, returns a new string. "aa".replace("a","b") → "bb".将每个 old 替换为 new,返回新字符串。"aa".replace("a","b") → "bb"
f-string syntaxf-string 语法
f"text {expr}" — the expression inside {} is evaluated and inserted. f"Score: {90+2}""Score: 92". Modern Python preferred.f"text {expr}"——{} 内的表达式被求值并插入。f"Score: {90+2}""Score: 92"。现代 Python 推荐。
sub in ssub in s
Returns True if sub is found anywhere in s, else False. Fastest membership test. "cat" in "concatenate" → True.如果 subs 中出现则返回 True,否则 False。最快的成员测试。"cat" in "concatenate" → True
s.find(sub)s.find(sub)
Returns the index of the first occurrence, or -1 if not found. Never returns False or None.返回第一次出现的索引,未找到则返回 -1。不会返回 FalseNone
s.count(sub)s.count(sub)
Returns the number of non-overlapping occurrences of sub in s. "banana".count("a") → 3. ON ICS3C A1.2 "count occurrences."返回 subs 中不重叠的出现次数。"banana".count("a") → 3。ON ICS3C A1.2"计算出现次数"。
For-each vs for-range over a string字符串的 for-each 与 for-range
for ch in s: gives character, no index. for i in range(len(s)): gives index and character via s[i]. Use range when position matters.for ch in s:给出字符,无索引。for i in range(len(s)):通过 s[i] 给出索引和字符。当位置重要时用 range。
Word-count pipeline词频统计流水线
1. lower() normalise. 2. split() tokenise. 3. Loop and count with a dict. ON ICS3C A1.2 "count occurrences of a word."1. lower() 规范化。2. split() 分词。3. 用字典循环计数。ON ICS3C A1.2"计算单词出现次数"。
String immutability字符串不可变性
You cannot do s[0] = "X" — raises TypeError. Every method returns a new string. To "modify" s, reassign: s = s.replace("a","b").不能执行 s[0] = "X"——引发 TypeError。每个方法都返回新字符串。要"修改" s,需重新赋值:s = s.replace("a","b")

Practice Quiz综合测验

Given s = "Data", what is s[1]?给定 s = "Data"s[1] 是什么?
Q1
"D""D"
"t""t"
"a""a"
"Data""Data"
D=0, a=1, t=2, a=3. Index 1 = "a".D=0, a=1, t=2, a=3。索引 1 = "a"。
Zero-based indexing: D=0, a=1, t=2, a=3. s[1] = "a".从零开始索引:D=0, a=1, t=2, a=3。s[1] = "a"。
What does "Science"[1:5] return?"Science"[1:5] 返回什么?
Q2
"Scie""Scie"
"cien""cien"
"cience""cience"
"cie""cie"
"Science": S=0,c=1,i=2,e=3,n=4,c=5,e=6. [1:5] = indices 1,2,3,4 = "cien". Stop index 5 excluded."Science":S=0,c=1,i=2,e=3,n=4,c=5,e=6。[1:5] = 索引 1,2,3,4 = "cien"。stop 索引 5 不含。
Slice stop is excluded: [1:5] gives indices 1,2,3,4 = "cien".切片 stop 不含:[1:5] 给出索引 1,2,3,4 = "cien"。
What does "hello world".split() return?"hello world".split() 返回什么?
Q3
["hello", "world"]["hello", "world"]
"hello world""hello world"
["h","e","l","l","o"," ","w","o","r","l","d"]["h","e","l","l","o"," ","w","o","r","l","d"]
22
split() with no argument splits on whitespace, returning a list of words: ["hello", "world"].无参数的 split() 按空白分割,返回词列表:["hello", "world"]
split() returns a list of tokens, not individual characters and not a number.split() 返回词元列表,不是单个字符,也不是数字。
Given name = "Alice", what does f"Hello {name}!" evaluate to?给定 name = "Alice"f"Hello {name}!" 的结果是什么?
Q4
"Hello {name}!""Hello {name}!"
"Hello name!""Hello name!"
SyntaxErrorSyntaxError
"Hello Alice!""Hello Alice!"
In an f-string, {name} is evaluated and replaced with the value of name = "Alice". Result: "Hello Alice!".在 f-string 中,{name} 被求值并替换为 name 的值 = "Alice"。结果:"Hello Alice!"。
f-strings evaluate {} expressions at runtime. {name} becomes "Alice".f-string 在运行时对 {} 表达式求值。{name} 变为 "Alice"。
What does "mississippi".count("ss") return?"mississippi".count("ss") 返回什么?
Q5
4
2
3
1
"mississippi": m-i-s-s-i-s-s-i-p-p-i. "ss" appears at index 2 (s,s) and index 5 (s,s) — 2 non-overlapping occurrences."mississippi":m-i-s-s-i-s-s-i-p-p-i。"ss" 出现在索引 2(s,s)和索引 5(s,s)——2 个不重叠的出现。
count("ss") counts non-overlapping "ss" substrings. In "mississippi": positions 2 and 5 = 2 occurrences.count("ss") 计算不重叠的 "ss" 子字符串。在 "mississippi" 中:位置 2 和 5 = 2 次出现。
The following code is supposed to count vowels in "code" and print 2. What does it actually print?
s = "code"
count = 0
for ch in s:
    if ch in "aeiou":
        count = 1
print(count)
以下代码本应统计 "code" 中的元音并打印 2。实际打印什么?
s = "code"
count = 0
for ch in s:
    if ch in "aeiou":
        count = 1
print(count)
Q6
2
0
1
4
The bug is count = 1 instead of count += 1. Every time a vowel is found, count is reset to 1, not incremented. "code" has vowels "o" and "e"; the last one sets count = 1. Fix: count += 1.错误是 count = 1 而不是 count += 1。每次找到元音时,count 被重置为 1,而不是递增。"code" 有元音 "o" 和 "e";最后一个将 count 设为 1。修复:count += 1
count = 1 resets count to 1 on every vowel — it doesn't accumulate. The last vowel "e" sets count = 1, so print(count) outputs 1.count = 1 在每次遇到元音时将 count 重置为 1——不累积。最后一个元音 "e" 将 count 设为 1,因此 print(count) 输出 1。
Which CSTA standard lists strings explicitly as a "fundamental data structure"? 🇺🇸 CSTA哪个 CSTA 标准明确将字符串列为"基本数据结构"?🇺🇸 CSTA
Q7
3B-AP-12
3A-AP-13
3A-AP-17
3A-DA-09
CSTA 3B-AP-12 Descriptive Statement (verbatim): "Examples could include strings, lists, arrays, stacks, and queues." Strings are the first example named.CSTA 3B-AP-12 描述性说明(原文):"示例可包括字符串、列表、数组、栈和队列。"字符串是第一个命名的示例。
3B-AP-12 = fundamental data structures (strings, lists, arrays…). 3A-AP-13 = prototypes. 3A-AP-17 = decomposition. 3A-DA-09 = bit representations.3B-AP-12 = 基本数据结构(字符串、列表、数组……)。3A-AP-13 = 原型。3A-AP-17 = 分解。3A-DA-09 = 位表示。

Readiness Checklist准备就绪清单

Tick each item when you can do it cold, without notes, on a first attempt.能在无笔记、首次尝试下完成,再勾选每一项。

0 / 11 mastered已掌握 0 / 11

What This Feeds Into本单元的去向

String manipulation is a prerequisite skill for almost every real program. Within the HS Computer Science series, the data-structures and searching units depend on your ability to tokenise text, compare strings, and loop character by character. Both downstream AP courses test string operations directly. The link below points to the AP CSA guide confirmed to exist in this repo; AP CSA Unit 1 introduces Java String methods (which closely parallel the Python methods in this guide).字符串操作是几乎所有真实程序的先决技能。在 HS Computer Science 系列中,数据结构和搜索单元依赖于你分词、比较字符串和逐字符循环的能力。两门下游 AP 课程都直接测试字符串操作。以下链接指向本仓库中已确认存在的 AP CSA 指南;AP CSA Unit 1 介绍 Java String 方法(与本指南中的 Python 方法高度对应)。

Within High School Computer Science.在 HS Computer Science 内部。

The Data Structures unit builds on split() and tokenisation to process lists of strings. The Searching and Sorting unit applies the character-comparison logic you practiced in §5 to sort strings alphabetically. The Software Development unit uses the same lower().strip() normalisation pipeline for input validation. AP CSP Topic 3.4 (Strings) directly tests the indexing, slicing, and method knowledge from §1–§3 of this guide.数据结构单元在 split() 和分词的基础上处理字符串列表。搜索与排序单元将 §5 中练习的字符比较逻辑应用于按字母顺序排序字符串。软件开发单元将相同的 lower().strip() 规范化流水线用于输入验证。AP CSP 主题 3.4(字符串)直接测试本指南 §1–§3 中的索引、切片和方法知识。

AP feeder links (existing in this repo).AP 衔接链接(本仓库中已有)。

AP CSA Unit 1 · Using Objects and Methods (Java String methods: length(), substring(), indexOf(), equals() map directly to the Python string skills from this guide)AP CSA Unit 1 · 使用对象和方法(Java String 方法:length()substring()indexOf()equals() 直接对应本指南中的 Python 字符串技能)

AP CSP Topic 3.4 (Strings) is the direct exam anchor: the reference-sheet string operations (substring, concatenation, length) all appear in the AP CSP free-response and multiple-choice sections. Skill 4.B ("Determine the result of code segments") means you must be able to predict a string expression's output without running it — exactly the tracing practice in the worked examples here.AP CSP 主题 3.4(字符串)是直接的考试依据:参考手册字符串操作(子字符串、拼接、长度)都出现在 AP CSP 简答题和选择题部分。技能 4.B("确定代码段的结果")意味着你必须能够不运行代码就预测字符串表达式的输出——正是本指南例题中练习的追踪技能。