re模块

python使用 re 模块来对数据进行正则匹配以及一些操作, 这里介绍一些常用的

关于正则表达式

re.match()

match方法用用于在开头匹配, 只匹配开头

基本用法为

re.match(pattern, string, flags=0)

pattern 参数代表进行匹配的正则表达式, string 参数是需要进行匹配的字符串, flags 可选

匹配成功后返回匹配到的对象, 没有匹配到则返回 None

比如需要匹配一段字符串中的urlhttps://www.baidu.com:

# 待匹配的字符串
str = "http://www.baidu.comfdksjflskjfasdf"
# 匹配模式
pattern = r"http://www\.[a-zA-Z0-9-]+\.com"
# 进行匹配
url = re.match(pattern, str)
# 打印对象
print(url)
# <re.Match object; span=(0, 20), match='http://www.baidu.com'>

# 打印匹配到的内容
print(url.group())
# http://www.baidu.com

re.search()

search方法会尝试在整个字符串中寻找待匹配的内容并返回其中的第一个

基本用法:

re.search(pattern, string, flags=0)

参数与 match 相同

# 待匹配的内容不在开头
str = "oiwhjdfolksnhttp://www.baidu.comfdksjflskjfasdf"
pattern = r"http://www\.[a-zA-Z0-9-]+\.com"
url = re.search(pattern, str)
print(url)
# <re.Match object; span=(12, 32), match='http://www.baidu.com'>
print(url.group())
# http://www.baidu.com

re.findall()

findall 用法与前两个基本相同, 不同在于 findall 会匹配所有的可能并返回一个列表

# 待匹配的内容有多个
str = "osnhttp://www.baidu.comfdksjflhttp://www.bing.comskjf"
pattern = r"http://www\.[a-zA-Z0-9-]+\.com"
urls = re.findall(pattern, str)
print(urls)
# ['http://www.baidu.com', 'http://www.bing.com']

05 十一月 2024