Introducing Julia/Strings and characters
Strings and characters
"this is a string"
第二点是,在使用两个特定字符时必须小心:双引号(")和美元符号($)。如果要在字符串中包含双引号字符,则必须在其前面加上反斜杠,否则字符串的其余部分将被解释为 Julia 代码,可能会产生有趣的结果。如果您想在字符串中包含一个美元符号($),也应该以反斜杠开头,因为它用于字符串插值 string interpolation。
julia> demand = "You owe me \$50!" "You owe me \$50!" julia> println(demand) You owe me $50!
julia> demandquote = "He said, \"You owe me \$50!\"" "He said, \"You owe me \$50!\""
julia> """this is "a" string""" "this is \"a\" string"
r" "
表示这是一个正则表达式v" "
表示这是一个版本字符串b" "
表示这是一个字节字面量raw" "
表示这是一个 raw string 不允许插值
String interpolation
编辑您通常希望在字符串中使用 Julia 表达式的结果。例如,假设您想说:
"The value of x is n."
这 n
是 x
的当前值。任何 Julia 的表达式都可以插入到具有 $()
julia> x = 42 42 julia> "The value of x is $(x)." "The value of x is 42."
julia> "The value of x is $x." "The value of x is 42."
若要将 Julia 表达式的结果包含在字符串中,请先将表达式括在括号中,然后在其前面加上一个美元符号:
julia> "The value of 2 + 2 is $(2 + 2)." "The value of 2 + 2 is 4."
编辑若要从字符串中提取较小的字符串,请使用 getindex(s, range)
或 s[range]
julia> s = String("a load of characters") "a load of characters" julia> s[1:end] "a load of characters" julia> s[3:6] "load"
julia> s[3:end-6] "load of char"
for char in s
print(char, "_")
a_ _l_o_a_d_ _o_f_ _c_h_a_r_a_c_t_e_r_s_
julia> s[1:1] "a" julia> s[1] 'a'
Unicode 字符串
编辑并非所有字符串都是 ASCII。要访问 Unicode 字符串中的单个字符,不能总是使用简单索引,因为某些字符占用多个索引位置。不要仅仅因为一些 index 看起来会起作用就被愚弄了:
julia> su = String("AéB𐅍CD") "AéB𐅍CD" julia> su[1] 'A' julia> su[2] 'é' julia> su[3] ERROR: UnicodeError: invalid character index in slow_utf8_next(::Array{UInt8,1}, ::UInt8, ::Int64) at ./strings/string.jl:67 in next at ./strings/string.jl:92 [inlined] in getindex(::String, ::Int64) at ./strings/basic.jl:70
使用 lastindex(str)
而不是 length(str)
julia> length(su) 6
julia> lastindex(su) 10
函数用于测试字符串是 ASCII 还是包含 Unicode 字符:
julia> isascii(su) false
在这个字符串中,“第二个”字符 (é) 有2个字节,“第四个”字符(𐅍)有4个字节。
对于处理这样的字符串,有一些有用的函数,包括 thisind()
, nextind()
和 prevind()
for i in eachindex(su)
println(thisind(su, i), " -> ", su[i])
1 -> A
2 -> é
4 -> B
5 -> 𐅍
9 -> C
10 -> D
“第三个”字符 B 从字符串中的第四个元素开始。
此外,可以使用 eachindex
for charindex in eachindex(su)
@show su[charindex]
su[charindex] = 'A'
su[charindex] = 'é'
su[charindex] = 'B'
su[charindex] = '𐅍'
su[charindex] = 'C'
su[charindex] = 'D'
对字符串 split 和 join
编辑您可以使用乘 (*
) 运算符将字符串粘合在一起(通常称为串联的过程):
julia> "s" * "t" "st"
julia> "s" + "t" LoadError: MethodError: `+` has no method matching +(::String, ::String)
- 因此用 *
julia> "s" ^ 18 "ssssssssssssssssss"
你也可以用 string()
julia> string("s", "t") "st"
若要拆分字符串,请使用 split()
julia> s = "You know my methods, Watson." "You know my methods, Watson."
对 split()
julia> split(s) 5-element Array{SubString{String},1}: "You" "know" "my" "methods," "Watson."
也可以指定要在以下位置拆分的 1个或多个字符的字符串:
julia> split(s, "e") 2-element Array{SubString{String},1}: "You know my m" "thods, Watson." julia> split(s, " m")' 3-element Array{SubString{String},1}: "You know" "y" "ethods, Watson."
julia> split(s, "hod") 2-element Array{SubString{String},1}: "You know my met" "s, Watson."
julia> split(s,"") 28-element Array{SubString{String},1}: "Y" "o" "u" " " "k" "n" "o" "w" " " "m" "y" " " "m" "e" "t" "h" "o" "d" "s" "," " " "W" "a" "t" "s" "o" "n" "."
也可以使用正则表达式来定义分割点来拆分字符串。使用特殊的正则表达式字符串结构 r" "
julia> split(s, r"a|e|i|o|u") 8-element Array{SubString{String},1}: "Y" "" " kn" "w my m" "th" "ds, W" "ts" "n."
是一个正则表达式字符串,而且-如果您喜欢正则表达式的话-它与任何元音都匹配。因此,结果数组由每个元音处的字符串分割组成。注意结果中的空字符串-如果您不想要这些字符串,请在末尾添加一个 false 标志:
julia> split(s, r"a|e|i|o|u", false) 7-element Array{SubString{String},1}: "Y" " kn" "w my m" "th" "ds, W" "ts" "n."
可以使用 join()
julia> join(split(s, r"a|e|i|o|u", false), "aiou") "Yaiou knaiouw my maiouthaiouds, Waioutsaioun."
Splitting using a function
julia> split(join(Char.(65:90)), c -> Int(c) % 8 == 0) 4-element Array{SubString{String},1}: "ABCDEFG" "IJKLMNO" "QRSTUVW" "YZ"
julia> s[1:1] "a"
julia> s[1] 'a'
julia> string('s') * string('d') "sd"
julia> string('s', 'd') "sd"
使用 \U
转义 输入32位 Unicode 字符很容易(大写表示32位)。小写转义序列 \u 可用于16位和8位字符:
julia> ('\U1014d', '\u2640', '\u26') ('𐅍','♀','&')
julia> "\U0001014d2\U000026402\u26402\U000000a52\u00a52\U000000352\u00352\x352" "𐅍2♀2♀2¥2¥2525252"
编辑将整数转换为字符串也是 string()
函数的工作。关键字 base
julia> string(11, base=2) "1011"
julia> string(11, base=8) "13" julia> string(11, base=16) "b" julia> string(11) "11"
julia> a = BigInt(2)^200 1606938044258990275541962092341162602522202993782792835301376
julia> string(a) "1606938044258990275541962092341162602522202993782792835301376"
julia> string(a, base=16) "1000000000000000000000000000000000000000000000000"
要将字符串转换为数字,请使用 parse()
julia> parse(Int, "100") 100 julia> parse(Int, "100", base=2) 4 julia> parse(Int, "100", base=16) 256 julia> parse(Float64, "100.32") 100.32 julia> parse(Complex{Float64}, "0 + 1im") 0.0 + 1.0im
julia> Char(8253) '‽': Unicode U+203d (category Po: Punctuation, other) julia> Char(0x203d) # the Interrobang is Unicode U+203d in hexadecimal '‽': Unicode U+203d (category Po: Punctuation, other) julia> Int('‽') 8253 julia> string(Int('‽'), base=16) "203d"
要从单个字符串转换为代码号(如其 ASCII 或 UTF code number),请尝试以下操作:
julia> Int("S"[1]) 83
printf 格式
编辑如果您深深依赖于 C风格的 printf()
函数,那么您能够使用 Julia 宏(通过在宏前面加上 @
符号来调用它们)。宏在 Printf 包中提供,您需要先加载该程序包:
julia> using Printf
julia> @printf("pi = %0.20f", float(pi)) pi = 3.14159265358979311600
或者,也可以使用 sprintf()
宏创建另一个字符串,也可以在 Printf 包中找到:
julia> @sprintf("pi = %0.20f", float(pi)) "pi = 3.14159265358979311600"
编辑要将字符串读入数组,可以使用 IOBuffer()
data="1 2 3 4
5 6 7 8
9 0 1 2"
"1 2 3 4\n5 6 7 8\n9 0 1 2"
现在,您可以使用 readdlm()
之类的函数“读取”这个字符串,即“使用分隔符读取(read with delimiters)”函数。这可以在DelimitedFiles包中找到。
julia> using DelimitedFiles julia> readdlm(IOBuffer(data)) 3x4 Array{Float64,2}: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 0.0 1.0 2.0
julia> readdlm(IOBuffer(data), Int) 3x4 Array{Int64,2}: 1 2 3 4 5 6 7 8 9 0 1 2
julia> s = "/Users/me/Music/iTunes/iTunes Media/Mobile Applications";
可以使用 collect()
将路径名字符串 分解 为字符对象数组,该方法将集合中的项或字符串分解为数组:
julia> collect(s) 55-element Array{Char,1}: '/' 'U' 's' 'e' 'r' 's' '/' ...
julia> split(s, "") 55-element Array{Char,1}: '/' 'U' 's' 'e' 'r' 's' '/' ...
julia> count(c -> c == '/', collect(s)) 6
julia> count(c -> c == '/', s) 6
编辑如果您想知道字符串是否包含特定字符,请使用通用的 in()
julia> s = "Elementary, my dear Watson";
julia> in('m', s)
但是,接受两个字符串的 occursin()
函数更有用,因为您可以将子串与一个或多个字符一起使用。请注意,将搜索词放在第一位,然后是正在查找的字符串。occursin(needle, haystack)
julia> occursin("Wat", s) true
julia> occursin("m", s) true
julia> occursin("mi", s) false
julia> occursin("me", s) true
您可以使用 findfirst(needle, haystack)
julia> s ="You know my methods, Watson."; julia> findfirst("meth", s) 13:16
julia> findfirst(r"[aeiou]", s) # first vowel 2
julia> findfirst(isequal('a'), s) # first occurrence of character 'a' 23
julia> replace("Sherlock Holmes", "e" => "ee") "Sheerlock Holmees"
You use the => operator to specify the pattern you're looking for, and its replacement. Usually the third argument is another string, as here. But you can also supply a function that processes the result:
julia> replace("Sherlock Holmes", "e" => uppercase) "ShErlock HolmEs"
where the function (here, the built-in uppercase()
function) is applied to the matching substring.
There's no replace!
function, where the "!" indicates a function that changes its argument. That's because you can't change a string — they're immutable.
Replacing using functions
编辑Many functions in Julia allow you to supply functions as part of the function call, and you can make good use of anonymous functions for this. Here, for example, is how to use a function to provide random replacements in a replace()
julia> t = "You can never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant.";
julia> replace(t, r"a|e|i|o|u" => (c) -> rand(Bool) ? "0" : "1") "Y00 c1n n0v0r f1r0t1ll wh1t 0ny 0n0 m0n w1ll d0, b0t y01 c1n s1y w0th pr1c1s10n wh0t 1n 1v0r0g0 n1mb0r w0ll b0 0p t1. Ind1v0d11ls v0ry, b0t p1rc0nt0g0s r0m01n c1nst0nt."
julia> replace(t, r"a|e|i|o|u" => (c) -> rand(Bool) ? "0" : "1") "Y11 c0n...n1v0r f0r1t0ll wh1t 1ny 0n1 m0n w1ll d1, b1t y10 c1n s1y w1th pr0c1s01n wh0t 0n 0v1r0g0 n1mb1r w0ll b0 1p t1. Ind1v0d01ls v0ry, b1t p0rc1nt1g0s r0m01n c1nst0nt."
编辑You can use regular expressions to find matches for substrings. Some functions that accept a regular expression are:
changes occurrences of regular expressionsmatch()
returns the first match or nothingeachmatch()
returns an iterator that lets you search through all matchessplit()
splits a string at every match
Use replace()
to replace each consonant with an underscore:
julia> replace("Elementary, my dear Watson!", r"[^aeiou]" => "_") "__e_e__a________ea___a__o__"
and the following code replaces each vowel with the results of running a function on each match:
julia> replace("Elementary, my dear Watson!", r"[aeiou]" => uppercase) "ElEmEntAry, my dEAr WAtsOn!"
With replace()
you can access the matches if you provide a special substitution string s""
, where \1
refers to the first match, \2
to the second, and so on. With this regex operation, each lowercase letter preceded by a space is repeated three times:
julia> replace("Elementary, my dear Watson!", r"(\s)([a-z])" => s"\1\2\2\2") "Elementary, mmmy dddear Watson!"
For more regular expression fun, there are the -match-
Here I've loaded the complete text of "The Adventures of Sherlock Holmes" from a file into the string called text
julia> f = "/tmp/adventures-of-sherlock-holmes.txt" julia> text = read(f, String);
To use the possibility of a match as a Boolean condition, suitable for use in an if
statement for example, use occursin()
julia> occursin(r"Opium", text) false
That's odd. We were expecting to find evidence of the great detective's peculiar pharmacological recreations. In fact, the word "opium" does appear in the text, but only in lower-case, hence this false
result—regular expressions are case-sensitive.
julia> occursin(r"(?i)Opium", text) true
This is a case-insensitive search, set by the flag (?i)
), and it returns true
You could check every line for the word using a simple loop:
for l in split(text, "\n")
occursin(r"opium", l) && println(l)
opium. The habit grew upon him, as I understand, from some
he had, when the fit was on him, made use of an opium den in the
brown opium smoke, and terraced with wooden berths, like the
wrinkled, bent with age, an opium pipe dangling down from between
very short time a decrepit figure had emerged from the opium den,
opium-smoking to cocaine injections, and all the other little
steps - for the house was none other than the opium den in which
lives upon the second floor of the opium den, and who was
learn to have been the lodger at the opium den, and to have been
doing in the opium den, what happened to him when there, where is
"Had he ever showed any signs of having taken opium?"
room above the opium den when I looked out of my window and saw,
For more useable output (in the REPL), add enumerate()
and some highlighting:
red = Base.text_colors[:red]; default = Base.text_colors[:default];
for (n, l) in enumerate(split(text, "\n"))
occursin(r"opium", l) && println("$n $(replace(l, "opium" => "$(red)opium$(default)"))")
5087 opium. The habit grew upon him, as I understand, from some 5140 he had, when the fit was on him, made use of an opium den in the 5173 brown opium smoke, and terraced with wooden berths, like the 5237 wrinkled, bent with age, an opium pipe dangling down from between 5273 very short time a decrepit figure had emerged from the opium den, 5280 opium-smoking to cocaine injections, and all the other little 5429 steps - for the house was none other than the opium den in which 5486 lives upon the second floor of the opium den, and who was 5510 learn to have been the lodger at the opium den, and to have been 5593 doing in the opium den, what happened to him when there, where is 5846 "Had he ever showed any signs of having taken opium?" 6129 room above the opium den when I looked out of my window and saw,
There's an alternative syntax for adding regex modifiers, such as case-insensitive matches. Notice the "i" immediately following the regex string in the second example:
julia> occursin(r"Opium", text) false julia> occursin(r"Opium"i, text) true
With the eachmatch()
function, you apply the regex to the string to produce an iterator. For example, to look for substrings in our text matching the letters "L", followed by some other characters, ending with "ed":
julia> lmatch = eachmatch(r"L.*?ed", text)
The result in lmatch
is an iterable object containing all the matches, as RegexMatch objects:
julia> collect(lmatch)[1:10] 10-element Array{RegexMatch,1}: RegexMatch("London, and proceed") RegexMatch("London is a pleasant thing indeed") RegexMatch("Looking for lodgings,\" I answered") RegexMatch("London he had received") RegexMatch("Lied") RegexMatch("Life,\" and it attempted") RegexMatch("Lauriston Gardens wore an ill-omened") RegexMatch("Let\" card had developed") RegexMatch("Lestrade, is here. I had relied") RegexMatch("Lestrade grabbed")
We can step through the iterator and look at each match in turn. You can access a number of fields of a RegexMatch, to extract information about the match. These include captures
, match
, offset
, offsets
, and regex
. For example, the match
field contains the matched substring:
for i in lmatch
London - quite so! Your Majesty, as I understand, became entangled
Lodge. As it pulled
Lord, Mr. Wilson, that I was a red
League of the Red
League was founded
London when he was young, and he wanted
LSON" in white letters, upon a corner house, announced
League, and the copying of the 'Encyclopaed
Leadenhall Street Post Office, to be left till called
Let the whole incident be a sealed
Lestrade, being rather puzzled
Lestrade would have noted
Lestrade," drawled
Lestrade looked
Lord St. Simon has not already arrived
Lord St. Simon sank into a chair and passed
Lord St. Simon had by no means relaxed
Lordship. "I may be forced
London. What could have happened
London, and I had placed
Other fields include captures
, the captured substrings as an array of strings, offset
the offset into the string at which the whole match begins, and offsets
, the offsets of the captured substrings.
To get an array of matching strings, use something like this:
julia> collect(m.match for m in eachmatch(r"L.*?ed", text)) 58-element Array{SubString{String},1}: "London - quite so! Your Majesty, as I understand, became entangled" "Lodge. As it pulled" "Lord, Mr. Wilson, that I was a red" "League of the Red" "League was founded" "London when he was young, and he wanted" "Leadenhall Street Post Office, to be left till called" "Let the whole incident be a sealed" "Lestrade, being rather puzzled" "Lestrade would have noted" "Lestrade looked" "Lestrade laughed" "Lestrade shrugged" "Lestrade called" ... "Lord St. Simon shrugged" "Lady St. Simon was decoyed" "Lestrade,\" drawled" "Lestrade looked" "Lord St. Simon has not already arrived" "Lord St. Simon sank into a chair and passed" "Lord St. Simon had by no means relaxed" "Lordship. \"I may be forced" "London. What could have happened" "London, and I had placed"
The basic match()
function looks for the first match for your regex. Use the match
field to extract the information from the RegexMatch object:
julia> match(r"She.*",text).match "Sherlock Holmes she is always THE woman. I have seldom heard\r"
A more streamlined way of obtaining matching lines from a file is this:
julia> f = "adventures of sherlock holmes.txt" julia> filter(s -> occursin(r"(?i)Opium", s), map(chomp, readlines(open(f)))) 12-element Array{SubString{String},1}: "opium. The habit grew upon him, as I understand, from some" "he had, when the fit was on him, made use of an opium den in the" "brown opium smoke, and terraced with wooden berths, like the" "wrinkled, bent with age, an opium pipe dangling down from between" "very short time a decrepit figure had emerged from the opium den," "opium-smoking to cocaine injections, and all the other little" "steps - for the house was none other than the opium den in which" "lives upon the second floor of the opium den, and who was" "learn to have been the lodger at the opium den, and to have been" "doing in the opium den, what happened to him when there, where is" "\"Had he ever showed any signs of having taken opium?\"" "room above the opium den when I looked out of my window and saw,"
Making a Regex
编辑Sometimes you want to make a regular expression from within your code. You can do this by making a Regex object. Here is one way you could count the number of vowels in the text:
f = open("sherlock-holmes.txt")
text = read(f, String)
for vowel in "aeiou"
r = Regex(string(vowel))
l = [m.match for m = eachmatch(r, thetext)]
println("there are $(length(l)) letter \"$vowel\"s in the text.")
there are 219626 letter "a"s in the text.
there are 337212 letter "e"s in the text.
there are 167552 letter "i"s in the text.
there are 212834 letter "o"s in the text.
there are 82924 letter "u"s in the text.
编辑There are lots of functions for testing and changing strings:
length of stringsizeof(str)
length/sizestartswith(strA, strB)
does strA start with strB?endswith(strA, strB)
does strA end with strB?occursin(strA, strB)
does strA occur in strB?all(isletter, str)
is str entirely letters?all(isnumeric, str)
is str entirely number characters?isascii(str)
is str ASCII?all(iscntrl, str)
is str entirely control characters?all(isdigit, str)
is str 0-9?all(ispunct, str)
does str consist of punctuation?all(isspace, str)
is str whitespace characters?all(isuppercase, str)
is str uppercase?all(islowercase, str)
is str entirely lowercase?all(isxdigit, str)
is str entirely hexadecimal digits?uppercase(str)
return a copy of str converted to uppercaselowercase(str)
return a copy of str converted to lowercasetitlecase(str)
return copy of str with the first character of each word converted to uppercaseuppercasefirst(str)
return copy of str with first character converted to uppercaselowercasefirst(str)
return copy of str with first character converted to lowercasechop(str)
return a copy with the last character removedchomp(str)
return a copy with the last character removed only if it's a newline
编辑To write to a string, you can use a Julia stream. The sprint()
(String Print) function lets you use a function as the first argument, and uses the function and the rest of the arguments to send information to a stream, returning the result as a string.
For example, consider the following function, f
. The body of the function maps an anonymous 'print' function over the arguments, enclosing them with angle brackets. When used by sprint
, the function f
processes the remaining arguments and sends them to the stream.
function f(io::IO, args...)
map((a) -> print(io,"<",a, ">"), args)
f (generic function with 1 method)
julia> sprint(f, "fred", "jim", "bill", "fred blogs") "<fred><jim><bill><fred blogs>"
Functions like println()
can take an IOBuffer or stream as their first argument. This lets you print to streams instead of printing to the standard output device:
julia> iobuffer = IOBuffer() IOBuffer(data=Uint8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> for i in 1:100 println(iobuffer, string(i)) end
After this, the in-memory stream called iobuffer
is full of numbers and newlines, even though nothing was printed on the terminal. To copy the contents of iobuffer
from the stream to a string or array, you can use take!()
julia> String(take!(iobuffer)) "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14 ... \n98\n99\n100\n"