C++正则表达式零宽断⾔lookbehind
正则表达式零宽断⾔
适⽤场景:匹配/提取/查/替换以 xxx 开头,或以 xxx 结尾,但不包括 xxx 的字符串。
零宽断⾔⽤法含义正则匹配是什么
(?=exp)零宽度正预测先⾏断⾔ lookahead exp1(?=exp2)exp1之后必须匹配exp2,但匹配结果不含exp2
(?!exp)零宽度负预测先⾏断⾔ lookahead exp1(?!exp2)exp1之后必须不匹配exp2
(?<=exp)零宽度正回顾后发断⾔ lookbehind(?<=exp0)exp1exp1之前必须匹配exp0,但匹配结果不含exp0
(?<!exp)零宽度负回顾后发断⾔ lookbehind(?<!exp0)exp1exp1之前必须不匹配exp0
⽰例:提取【123】中的123的正则表达式:(?<=【)\d+(?=】)
问题描述
正则表达式匹配形似qq=123456的字符串,从中提取123456,但不包括qq=。⾸先想到的是直接利⽤零宽断⾔ lookbehind 去匹配,正则表达式很容易写出来(?<=qq=)[0-9]+,但是在 C++ 运⾏阶段报错:
terminate called after throwing an instance of 'std::regex_error'
what():  Invalid special open parenthesis.
Aborted (core dumped)
问题分析
⽬前 C++ 标准库正则表达式不⽀持零宽后⾏断⾔(也叫零宽度正回顾后发断⾔,lookbehind),即(?<=exp)和(?<!exp)语法。但⽀持零宽前⾏断⾔(lookahead)。
Finally, flavors like std::regex and Tcl do not support lookbehind at all, even though they do support lookahead. JavaScript was like that for the longest time since its inception. But now lookbehind is part of the ECMAScript 2018 specification. As of this
writing (late 2019), Google’s Chrome browser is the only popular JavaScript implementation that supports lookbehind. So if
cross-browser compatibility matters, you can’t use lookbehind in JavaScript.
解决⽅案
1. 构造 regex 时指定可选标志,使⽤其他正则表达式语法 ==> 验证失败
2. 把待提取部分⽤()括起来,作为⼀个独⽴⼦表达式 ==> 验证可⾏
3. 使⽤⽀持 lookbehind 的 Boost 正则 ==> 未验证
⽰例代码
#include <iostream>
#include <regex>
#include <string>
using namespace std;
using namespace regex_constants;
int main()
{
string seq = "[optional]qq=123456;";
string pattern_nok = "(?<=qq=)[0-9]+"; // C++ 正则表达式不⽀持 lookbehind,运⾏时报错
string pattern = "qq=([0-9]+)"; // 将数字部分单独作为⼀个⼦表达式
regex r(pattern /*, extended*/); // 可以在这⾥修改默认正则表达式语法,然⽽并没有什么⽤
smatch results;
if (regex_search(seq, results, r))
{
cout << results[0] << endl; // 打印整个匹配
cout << results[1] << endl; // 打印第⼀个正则⼦表达式
}
}
输出结果
qq=123456
123456
Reference

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。