Swift中String取下标及性能问题
Swift 中 String 取下标及性能问题
取下标
String
String ⽤ String.Index 取下标(subscript)得到 Character,String.Index 要从 String 中获取
let greeting = "Guten Tag!"
greeting[greeting.startIndex] // Character "G"
greeting[greeting.index(before: dIndex)] // Character "!"
greeting[greeting.index(after: greeting.startIndex)] // Character "u"
let index = greeting.index(greeting.startIndex, offsetBy: 7)
greeting[index] // Character "a"
String ⽤ Range<String.Index> 或 ClosedRange<String.Index> (以下 Range 和 ClosedRange 统称为 Range) 取下标得到 String
let str = "abc"
str[str.startIndex..<str.index(after: str.startIndex)] // String "a"
str[str.index(after: str.startIndex)] // String "ab"
Character
String 通过 characters 属性获得 String.CharacterView,表⽰屏幕上显⽰的内容。String.CharacterView 通过 String.CharacterView.Index 取下标得到
Character,String.CharacterView.Index 要从 String.CharacterView 中获取
let str = "abc"
let characters = str.characters // String.CharacterView
characters[characters.startIndex] // Character "a"
注意,String.CharacterView 不遵循 RandomAccessCollection 协议,⽤ String.CharacterView.Index 取下标不可以随机访问。另外,String.CharacterView.Index 与 String.Index 是相同的类型,属于 Struct。String.Index 的⽂档在 String ⽂档下
typealias Index = String.CharacterView.Index
String.CharacterView 通过 Range<String.CharacterView.Index> 得到 String.CharacterView。⽤ Character 和 String.CharacterView 都可以⽣成 String
let str = "abc"
let characters = str.characters // String.CharacterView
let characters2 = characters[characters.startIndex..<characters.index(after: characters.startIndex)] // String.CharacterView
String(characters.first!) == String(characters2) // true. characters.first! is Character
⽤ String.CharacterView ⽣成 Array<Character>,可以⽤ Int、Range<Int> 取下标。⽤ Array<Character> 也可以⽣成 String
let str = "abc"
let arr = Array(str.characters) // Array<Character> ["a", "b", "c"]
arr[1] // Character "b"
2] // ArraySlice<Character> ["b", "c"]
String(arr) // String "abc"
Character 可以直接与 "a" ⽐较
let str = "abc"
let a = str[str.startIndex] // Character "a"
let b = str[str.index(str.startIndex, offsetBy: 1)] // Character "b"
a == "a" // true
b > "a" // true
UTF-8
String 通过 utf8 属性获得 String.UTF8View,表⽰ UTF-8 编码的内容。String.UTF8View 通过 String.UTF8View.Index 取下标得到 UTF8.CodeUnit,实际上是 UInt8;通过Range<String.UTF8View.Index> 取下标得到 String.UTF8View。String.UTF8View.Index 要从 String.UTF8View 中获取。String.UTF8View 不遵循 RandomAccessCollection 协议,⽤ String.UTF8View.Index 取下标不可以随机访问。⽤ String.UTF8View ⽣成 Array<UInt8>,可以⽤ Int、Range<Int> 取下标。⽤ String.UTF8View 可以⽣成 String。⽤UInt8 或 Array<UInt8> 也可以⽣成 String,但内容表⽰数字或数字数组,不是数字的 UTF-8 编码内容。
let str = "abc"
let utf8 = str.utf8 // String.UTF8View
let n = utf8[utf8.startIndex] // UInt8 97
let a = utf8[utf8.startIndex..<utf8.index(after: utf8.startIndex)] // String.UTF8View "a"
let ab = utf8[utf8.index(after: utf8.startIndex)] // String.UTF8View "ab"
String(n) // "97", NOT "a"
String(a) // "a"
String(ab) // "ab"
let arr = Array(utf8) // Array<UInt8> [97, 98, 99]
let n2 = arr[0] // UInt8 97
let arr2 = 1] // // ArraySlice<UInt8> [97, 98]
String 通过 utf8CString 属性获得 ContiguousArray<CChar>,实际上是 ContiguousArray<Int8>,表⽰ UTF-8 编码的内容并且末尾增加⼀个 0,所以长度⽐ utf8 属性的长度⼤1。ContiguousArray<Int8> 可以⽤ Int、Range<Int> 取下标,分别得到 Int8 和 ArraySlice<Int8>。ContiguousArray 遵循 RandomAccessCollection 协议,⽤ Int 取下标可以随机访问。
let str = "abc"
let utf8 = str.utf8CString // ContiguousArray<Int8> [97, 98, 99, 0]
let a = utf8[0] // Int8 97
let ab = 1] // ArraySlice<Int8> [97, 98]
UTF-16
String 通过 utf16 属性获得 String.UTF16View,表⽰ UTF-16 编码的内容。String.UTF16View 通过 String.UTF16View.Index 取下标得到 UTF16.CodeUnit,实际上是 UInt16;通过 Range<String.UTF16View.Index> 取下标得到 String.UTF16View。String.UTF16View.Index 要从 String.UTF16View 中获取。String.UTF16View 遵循RandomAccessCollection 协议,⽤ String.UTF16View.Index 取下标可以随机访问。⽤ String.UTF16View ⽣成 Array<UInt16>,可以⽤ Int、Range<Int> 取下标。⽤
String.UTF16View 可以⽣成 String。⽤ UInt16 或 Array<UInt16> 也可以⽣成 String,但内容表⽰数字或数字数组,不是数字的 UTF-16 编码内容。
let str = "abc"
let utf16 = str.utf16 // String.UTF16View
let n = utf16[utf16.startIndex] // UInt16 97
let a = utf16[utf16.startIndex..<utf16.index(after: utf16.startIndex)] // String.UTF16View "a"
let ab = utf16[utf16.index(after: utf16.startIndex)] // String.UTF16View "ab"
String(n) // "97", NOT "a"
String(a) // "a"
String(ab) // "ab"
let arr = Array(utf16) // Array<UInt16> [97, 98, 99]
let n2 = arr[0] // UInt16 97
let arr2 = 1] // // ArraySlice<UInt8> [97, 98]
性能对⽐
对 String、String.CharacterView、Array<Character>、String.UTF8View、Array<UInt8>、ContiguousArray<Int8>、String.UTF16View、Array<UInt16> 进⾏判空(isEmpty)、获
取长度(count)、⼀个位置的取下标([index])、⼀段距离的取下标([range])测试,统计执⾏时间。
定义测试类型、打印和更新时间的⽅法、要测试的 String
import Foundation
enum TestType {
case isEmpty
case count
case index
case range
}
func printAndUpdateTime(_ date: inout Date) {
let now = Date()
print(now.timeIntervalSince(date))
date = now
}
let s = "aasdfsdfsdfgfdsg vrutj7edbj7 ergcwhmkl5lknjklqawkrcqjljkljqjlqjhbrlqwfcbhafcpiluioufnlkqjvjakjn fnvjalgkhlkdkjlkasdfsdfsdfgfdsg vrutj7edbj7 ergcwhmkl5lknjklqawkrcqjliopjktyuljqjlqjhbrlqwfcbhafciluioufnlkjvjakjn fnvjalgkhlkdkjlkasdfsdfsdfgfd 测试代码
let loopCount = 10000
let index = unt / 2
let testType: TestType = .range
print(testType)
var date = Date()
forLoop: for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = s.isEmpty
case .count:
break forLoop
case .index:
_ = s[s.index(s.startIndex, offsetBy: index)]
swift 字符串转数组case .range:
let endIndex = s.index(s.startIndex, offsetBy: index)
_ = s[s.startIndex..<endIndex]
}
}
if testType == .count {
date = Date()
} else {
print("String")
printAndUpdateTime(&date)
}
let characters = s.characters
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = characters.isEmpty
case .count:
_ = unt
case .index:
_ = characters[characters.index(characters.startIndex, offsetBy: index)]
case .range:
let endIndex = characters.index(characters.startIndex, offsetBy: index)
_ = characters[characters.startIndex..<endIndex]
}
}
print("Characters")
printAndUpdateTime(&date)
let characterArr = Array(characters)
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = characterArr.isEmpty
case .count:
_ = unt
case .index:
_ = characterArr[index]
case .range:
_ = characterArr[0..<index]
}
}
print("Characters array")
printAndUpdateTime(&date)
let utf8 = s.utf8
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = utf8.isEmpty
case .count:
_ = unt
case .index:
_ = utf8[utf8.index(utf8.startIndex, offsetBy: index)]
case .range:
let endIndex = utf8.index(utf8.startIndex, offsetBy: index)
_ = utf8[utf8.startIndex..<endIndex]
}
}
print("UTF-8")
printAndUpdateTime(&date)
let utf8Arr = Array(utf8)
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = utf8Arr.isEmpty
case .count:
_ = unt
case .index:
_ = utf8Arr[index]
case .range:
_ = utf8Arr[0..<index]
}
}
print("UTF-8 array")
printAndUpdateTime(&date)
let utf8CString = s.utf8CString
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = utf8CString.isEmpty
case .count:
_ = unt
case .index:
_ = utf8CString[index]
case .range:
_ = utf8CString[0..<index]
}
}
print("UTF-8 C string")
printAndUpdateTime(&date)
let utf16 = s.utf16
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = utf16.isEmpty
case .count:
_ = unt
case .index:
_ = utf16[utf16.index(utf16.startIndex, offsetBy: index)]
case .range:
let endIndex = utf16.index(utf16.startIndex, offsetBy: index)        _ = utf16[utf16.startIndex..<endIndex]
}
}
print("UTF-16")
printAndUpdateTime(&date)
let utf16Arr = Array(utf16)
for _ in 0..<loopCount {
switch testType {
case .isEmpty:
_ = utf16Arr.isEmpty
case .count:
_ = unt
case .index:
_ = utf16Arr[index]
case .range:
_ = utf16Arr[0..<index]
}
}
print("UTF-16 array")
printAndUpdateTime(&date)
测试结果
判空
获取长度
⼀个位置的取下标
⼀段距离的取下标
以上⽐较中,判断 String 是否为空,访问 String 的 isEmpty 速度最快。对于其他操作,遵循 RandomAccessCollection 协议(ContiguousArray<Int8>、String.UTF16View 以及其他 Array)的类型效率较⾼。
进⼀步⽐较判空操作
let loopCount = 10000
var date = Date()
for _ in 0..<loopCount {
_ = s.isEmpty
}
print("isEmpty")
printAndUpdateTime(&date)
for _ in 0..<loopCount {
_ = s == ""
}
print("== \"\"")
printAndUpdateTime(&date)
与访问 String 的 isEmpty 相⽐,判断 String 是否等于空 String 速度更快!
注意到⽂档中,对 String.UTF8View 和 String.UTF16View 的 Range 取下标⽅法的说明
subscript(bounds: Range<String.UTF8View.Index>) -> String.UTF8View { get }
subscript(bounds: Range<String.UTF16View.Index>) -> String.UTF16View { get }
Complexity: O(n) if the underlying string is bridged from Objective-C, where n is the length of the string; otherwise, O(1).
如果 String 是从 Objective-C 的 NSString 桥接来的,时间复杂度为 O(n),否则为 O(1)。这句话怎么理解呢?前⾯说了,String.UTF8View 不遵循 RandomAccessCollection 协议,⽽ String.UTF16View 遵循 RandomAccessCollection 协议,两者的时间复杂度应该不同。这⾥怎么说时间复杂度与 String 是否桥接⾃ NSString 有关?以下进⼀步探究。
let s2 = NSString(string: s) as String
let loopCount = 10000
let index = unt / 2
let index2 = unt - 1
func test(_ s: String) {
var date = Date()
let utf8 = s.utf8
for _ in 0..<loopCount {
_ = utf8[utf8.startIndex..<utf8.index(utf8.startIndex, offsetBy: index)]
}
print("UTF-8 index")
printAndUpdateTime(&date)
for _ in 0..<loopCount {
_ = utf8[utf8.startIndex..<utf8.index(utf8.startIndex, offsetBy: index2)]
}
print("UTF-8 index2")
printAndUpdateTime(&date)
let utf16 = s.utf16
for _ in 0..<loopCount {
_ = utf16[utf16.startIndex..<utf16.index(utf16.startIndex, offsetBy: index)]
}
print("UTF-16 index")
printAndUpdateTime(&date)
for _ in 0..<loopCount {
_ = utf16[utf16.startIndex..<utf16.index(utf16.startIndex, offsetBy: index2)]
}
print("UTF-16 index2")
printAndUpdateTime(&date)
}
print("String")
test(s)
print("\nString bridged from NSString")
test(s2)
测试结果
对⽐ index 与 index2 的差异。测试参数 index2 约为 index 的 2 倍。UTF-8 index2 的耗时也约为 index 的 2 倍。UTF-16 的 index 和 index2 耗时相近。这与是否遵循RandomAccessCollection 协议⼀致。
对⽐ String 与 NSString 的差异。桥接⾃ NSString 的 String 耗时⽐ String 要长,UTF-8 尤其明显。这应该就是⽂档说明的情况。⽤ Range 取下标,桥接⾃ NSString 的String,⽐ String 多⼀些操作,多出 O(n) 级别的时间,⽽不是取下标的时间复杂度是 O(n)。
应⽤
具体应⽤时,选取哪种编码⽅式、取下标⽅式?⾸先,编码⽅式要看具体应⽤场景。编码⽅法不同,字符串的长度可能不同。如果字符串只含英⽂,⽐较好办。如果字符串含有中⽂或 Emoji,选择编码⽅式就要慎重。注意,NSString 的 length 属性获得的长度对应 UTF-16 编码。
let str = "abc"
unt // 3
unt // 3
unt // 3
(str as NSString).length // 3
unt // 3
unt - 1 // 3
strlen(str) // 3
let emojiStr = "  "
unt // 1
unt // 2
unt // 4
(emojiStr as NSString).length // 4
unt // 8
unt - 1 // 8
strlen(emojiStr) // 8
let ChineseStr = "中⽂"
unt // 2
unt // 2
unt // 2
(ChineseStr as NSString).length // 2
unt // 6
unt - 1 // 6
strlen(ChineseStr) // 6
⼀般情况下,字符串要显⽰出来,就⽤ String.CharacterView。如果要取下标,考虑性能,就⽤ String.CharacterView ⽣成 Array<Character>。如果要⽤其他编码⽅式,也可以⽣成相应的 Array,以 Int 或 Range<Int> 取下标,效率⾼⽽且代码简洁。
现在 LeetCode ⽀持 Swift 了。如果做 LeetCode 的需要多次进⾏字符串取下标的题⽬,⽤不遵循 RandomAccessCollection 协议的类型(例如 String.CharacterView、String.UTF8View),可能思路对了结果却是 “Time Limit Exceeded”。参见:
为了避免 NSString 桥接带来的性能问题,在 Swift ⾥尽量⽤ String;尽量减少 Objective-C 的代码,尽量选择 Swift 编写的第三⽅库。
转载请注明出处:

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。