谈谈C语⾔中的序列点(sequencepoint)和副作⽤
(sideeffects)
⽹上关于序列点的介绍很多,参考⼏篇,做个总结。在C99标准⽂件5.1.2.3讲到了序列点问题,序列点的定义是⼀个程序执⾏中的点,这个点的特殊性在于,在这个点之前语句产⽣的所有副作⽤都将⽣效,⽽后⾯语句的副作⽤还没有发⽣。标准规定,在两个序列点之间,⼀个对象所保存的值最多只能被修改⼀次。在这⼀点,所有的事都是肯定的,⽽在序列点间,不能肯定某⼀个变量的值已经稳定,所以总体说来C语⾔的序列点只是⽤来说明这⼀点的值是肯定的。如何理解呢?先讲⼀下什么是副作⽤。
⼀个表达式有⼀个值,⽽在写出这个表达式的时候可能只是想要取得这个表达式的值。但有些表达式会有副作⽤。⽽有些表达式没有副作⽤,有时候我们正是要利⽤表达式的副作⽤来⼯作。⽐如:
int a = 10;
int b = a;    /* a这个表达式在这⾥没有副作⽤,这⾥只是想要取得 */
/* a这个变量的值10,⽽b = a这个表达式有副作⽤,它的 */
/* 副作⽤是使b的值改变成a的值。 */
这就是所谓的⼀个表达式的副作⽤。正是因为有了副作⽤,很多功能才得以完成。有些表达式既会产⽣⼀个值,也会产⽣副作⽤。如i++这个表达式既会产⽣⼀个值(它是i⾃增以前的值),也会产⽣副作⽤。
在⼀个序列点之间,连续两次改变,并且访问该变量,会带来问题,⽐如经典的:
int i = 1;
a = i++;
在⼀个序列点之间,改变了i的值,并且访问了i的值,它的作⽤是什么呢?是a[1] = 1;还是a[2] = 2呢?不确定,这种代码没有价值,并且⽼板肯定不会赏识你写出这么精简的代码,你会被开除的。再⽐如更经典的:
int i = 1;
printf("%d, %d, %d\n", i++, i++, i++);
i = 1;
printf("%d\n", i++ + i++ + i++);
i = 1;
printf("%d\n", ++i + ++i + ++i);
很多⼤学的C语⾔⽼师都会讲解这个问题,包括我的⽼师,在讲的时候笔者就没有弄明⽩,
其实,这是⼀个不值得讲解的问题,这是在跟编译器较劲,不同的编译器可能会得出不同的结果(但是平常的编译器可能会得出相同的结果,让程序员私下总结错误的经验。),这种根据不同的实现⽽得出不同的结果的代码没什么⽤。i++ + i++ + i++只是⼀个表达式,在这个表达式的内多次访问了变量i,结果不确定。并且这⼜会引发另外⼀个有趣的问题,可能有⼈会认为在这条语句执⾏完成以后i⾃加了3次,那i肯定是4?这也不确定,可能很多编译器做得确实是4,但是,在C标准中有这样⼀条:当⼀个表达式的值取决于编译器实现⽽不是C语⾔标准的时候,其中所做的任何处理都会不确定。即,如果有⼀个编译器在i++ +
i++ + i++这个表达式中只读取⼀次i的值,并且⼀直记住这个值,那么算第⼀个i++,因为i的值是1所以算出后i的值为2,再算第⼆个因为假设的是只读取⼀次i的值,那此时i的值还是1并且被加到2(因为没有经过序列点,所以i的值不能肯定为2),于是经过三次从1加到2的过程以后,最后i的值是2⽽不是期望的4,呵呵。其实这要看编译器如何实现了,不过既然得看编译器如何实现,那这种代码也得被炒鱿鱼。
1. chinaunix上了⼀段⾮常通俗的描述,讲的很好。
C语⾔中,只包含⼀个表达式的语句,如
x = (i++) * 2;
称为“表达式语句”。表达式语句结尾的";"是C标准定义的顺序点之⼀,但这不等同于说所有的";"都是顺序点,也不是说顺序点只有这⼀种。下⾯就是标准中定义的顺序点:
(1)函数调⽤时,实参表内全部参数求值结束,函数的第⼀条指令执⾏之前(注意参数分隔符“,”不是顺序点);
(2)&&操作符的左操作数结尾处;
(3)||操作符的左操作数结尾处;
(4)?:操作符的第⼀个操作数的结尾处;
(5)逗号运算符;
(6)表达式求值的结束点,具体包括下列⼏类:⾃动对象的初值计算结束处;表达式语句末尾的分号
处; do/while/if/switch/for语句的控制条件的右括号处;for语句控制条件中的两个分号处;return语句返回值计算结束(末尾的分号)处。
定义顺序点是为了尽量消除编译器解释表达式时的歧义,如果顺序点还是不能解决某些歧义,那么标准允许编译器的实现⾃由选择解释⽅式。理解顺序点还是要从定义它的⽬的来下⼿。
再举⼀个例⼦:
y = x++, x+1;
已知这个语句执⾏前x=2,问y的值是多少?
逗号运算符是顺序点。那么该表达式的值就是确定的,是4,因为按照顺序点的定义,在对x+1求值前,顺序点","前的表达式——x++求值的全部副作⽤都已发⽣完毕,计算x+1时x=3。这个例⼦中顺序点成功地消除了歧义。
注意这个歧义是怎样消除的。因为中间的顺序点使“相邻顺序点间对象的值只更改⼀次”的条件得到满⾜。
y = (x++) * (x++), 执⾏前x=2, y=?
答案是,因为这个表达式本⾝不包含顺序点,顺序点未能消除歧义,编译器⽣成的代码使y取4, 6(以及更多的⼀些可能值)都是符合标准定义的,程序员⾃⼰应为这个不符合顺序点定义的表达式造成的后果负责。
我对我⾃⼰的表达能⼒⽋佳表⽰抱歉,但我的确不准备对这个问题再做更多的解释。我愿意引⽤《Expert C Programming》中的⼀段话,来给⾃⼰⼀个下台阶:
However, the problem with standards manuals is that they only make sense if you already know what they mean. If people write them in English, the more precise they try to be, the longer, duller and more obscure they become. If they write them using mathematical notation to define the language, the manuals become inaccessible to too many people.
⾃然语⾔本⾝的不精确,往往容易造成越解释越不清楚的现象,⽽精确的数学语⾔,⼜已经超过包括我在内的⼤多数⼈的理解和应⽤能⼒。
序列点是程序执⾏序列中⼀些特殊的点。当有序列点存在时,序列点前⾯的表达式必须求值完毕,并且副作⽤也已经发⽣,才会计算序列点后⾯的表达式和其副作⽤。
2. 什么是副作⽤?举例⼦来说明。
int a = 5;
int b = a ++;
在给b赋值的语句中,表达式a++就有副作⽤,它返回a当前的值5后,要对a进⾏加1的操作。
哪些符号会⽣成序列点呢?
","会⽣成序列点。
","⽤于把多条语句拼接成⼀条语句。例如:
int b = 5;
++ b;
可由","拼接成
int b = 5, ++b;
因为","会产⽣序列点,所以","左边的表达式必须先求值,如果有副作⽤,副作⽤也会⽣效。然后才会继续处理","右边的表达式。
&&和||会产⽣序列点
逻辑与 && 和逻辑或 || 会产⽣序列点。
因为&&⽀持短路操作,必须先将&&左边的表达式计算完毕,如果结果为false,则不必再计算&&右边的表达式,直接返回false。
||和&&类似。
:中的"?"会产⽣序列点
三元操作符 ?:中的"?"会产⽣序列点。如:
int a = 5;
int b = a++ > 5? 0 : a;
b的结果是什么?因为"?"处有序列点,其左边的表达式必须先求值完毕。 a++ > 5在和5⽐较时,a并没有⾃增,所以表达式求值为false。因为"?"处的序列点,其左边表达式的副作⽤也要⽴即⽣效,即a⾃增1,变为6。因为"?"左边的表达式求值为false,所以三元操作符?:返回:右边的值a。此时a的值是6,所以b的值是6。
既然序列点这么重要,那现在就得讲讲⼀些重要的序列点了,这些重要的序列点要程序员⾃⼰平时总结。
1). ⼀个重要的序列点在完整表达式的结尾,所谓完整表达式就是指不是⼀个更⼤的表达式的⼦表达式的表达式,仔细理解。
int i = 1;
i++;    /* i++是⼀个完整表达式 */
i++ + 1; /* i++就不是⼀个完整的表达式,因为它是i++ + 1这个完整表达式的⼀部分 */
具体的完整表达式的种类,可以查阅相关资料,C99的标准⽂档是⼀个不错的选择。
2). 逗号表达式。逗号表达式会严格的按照顺序来执⾏并且在被逗号分隔开的表达式之间有⼀个序列点,所以,前⼀个逗号表达式如果是i++,则后⾯的表达式可以肯定现在的值是原来的值加1(如果有溢出则另当别论)。如:
int i = 1;
i++, i++, i++;
printf("%d\n", i);
现在的i肯定是4;
3). &&和||运算符。有⼀种短路算法来解决除法中的除0情况。如下
int a = 10;
int b = 0;
if (b && a/b)
{ /* some code here */ }
其中在求b的值的时候会短路,即,a/b不会执⾏。因为b的值为0,这样
可以放⼼的使⽤除法了。这两个运算符在使⽤的时候都可以当成⼀个序列点,如果前⼀个表达式的值已经可以认定这整个表达式的值为真或者为假,则后⾯的表达式没有必要再求值,是多余的。即如上⾯的a/b是多余的,不能求值,求值也会出错。它们之间的求值顺序是肯定的。
4). 条件运算符? : 。在问号的地⽅也存在⼀个序列点,也没什么可讲。反正就是问号前后可以访问和
改变同⼀个变量,并且这种访问是安全的。
最后,在⼀个表达式内的求值顺序没有固定顺序,还有⼀个表现是,如下:
funa() + funb() + func();
C语⾔标准没有规定这三个函数谁会先执⾏,如果对顺序有要求,可以⽤临时变量来缓解。
序列点之间的执⾏顺序
中给出的例⼦。
int i = 3;
int ans = (++i)+(++i)+(++i);
(++i)+(++i)+(++i)之间并没有序列点,它们的执⾏顺序如何呢? gcc编译后,先执⾏两个++i,把它们相加后,再计算第三个++i,再相加。
⽽Microsoft VC++编译后,先执⾏三个++i,再相加。两者得到的结果不同,谁对谁错呢?
谁也没有错。C标准规定:两个序列点之间的执⾏顺序是任意的。当然这个任意是在不违背操作符优先级和结合特性的前提下的。这个规定的意义是为编译器的优化留下空间。
知道这个规定,我们就应该避免在⼀⾏代码中重复出现被递增的同⼀个变量,因为编译器的⾏为不可预测。试想如果(++i)+(++i)+(++i)换
成(++a)+(++b)+(++c)(其中a、b、c是不同的变量),不管++a,++b和++c的求值顺序谁先谁后,结果都会是⼀致的。
3.  MISRA-C:2004这样告诫⽤户:
Rule 12.2 (required): The value of an expression shall be the same under any order of evaluation that the standard permits. [Unspecified 7–9; Undefined 18]
Apart from a few operators (notably the function call operator (), &&, ||, ?: and , (comma)) the order in which sub-expressions are evaluated is unspecified and can vary. This means that no reliance can be placed on the order of evaluation of sub-expressions, and in particular no reliance can be placed on the order in which side effects occur. Those points in the evaluation of an expression at which all previous side effects can be guaranteed to have taken place are called “sequence points”. Sequence points and side effects are described in sections 5.1.2.3, 6.3 and 6.6 of ISO 9899:1990 [2].
Note that the order of evaluation problem is not solved by the use of parentheses, as this is not a precedence issue.
The following notes give some guidance on how dependence on order of evaluation may occur, and therefore may assist in adopting the rule.
increment or decrement operators
As an example of what can go wrong, consider
x = b[i] + i++;
This will give different results depending on whether b[i] is evaluated before i++ or vice versa. The problem could be
avoided by putting the increment operation in a separate statement. The example would then become:
x = b[i] + i;
i++;
逗号表达式的运算顺序function arguments
The order of evaluation of function arguments is unspecified.
x = func( i++, i );
This will give different results depending on which of the function’s two parameters is evaluated first. l function pointers If a function is called via a function pointer there shall be no dependence on the order in which function designator and function arguments are evaluated.
p->task_start_fn(p++);
function calls
Functions may have additional effects when they are called (e.g. modifying some global data). Dependence on order of evaluation could be avoided by invoking the function prior to the expression that uses it, making use of a temporary variable for the value.
For example
x = f(a) + g(a);
could be written as
x = f(a);
x += g(a);
As an example of what can go wrong, consider an expression to get two values off a stack, subtract the second from the first, and push the result back on the stack:
push( pop() - pop() );
This will give different results depending on which of the pop() function calls is evaluated first (because pop() has side effects).
l nested assignment statements
Assignments nested within expressions cause additional side effects. The best way to avoid any chance of this leading to a dependence on order of evaluation is to not embed assignments within expressions.
For example, the following is not recommended:
x = y = y = z / 3 ;
x = y = y++;
l accessing a volatile
The volatile type qualifier is provided in C to denote objects whose value can change independently of the execution of the program (for example an input register). If an object of volatile qualified type is accessed this may change its value. C compilers will not optimise out reads of a volatile. In addition, as far as a C program is concerned, a read of a volatile has a side effect (changing the value of the volatile). It will usually be necessary to access volatile data as part of an expression, which then means there may be dependence on order of evaluation. Where possible though it is recommended that volatiles only be accessed in simple assignment statements, such as the following:
volatile uint16_t v;
x = v;
The rule addresses the order of evaluation problem with side effects. Note that there may also be an issue with the number of times a sub-expression is evaluated, which is not covered by this rule. This
can be a problem with function invocations where the function is implemented as a macro. For example, consider the following function-like macro and its invocation: #define MAX(a, b) ( ((a) > (b)) ? (a) : (b) )
z = MAX( i++, j );
The definition evaluates the first parameter twice if a > b but only once if a ² b. The macro invocation may thus increment i either once or twice, depending on the values of i and j. It should be noted that magnitude-dependent effects, such as those due to floating-point rounding, are also not addressed by this rule. Although the order in which side-effects occur is undefined, the result of an operation is otherwise well-defined and is controlled by the structure of the expression. In the following example, f1 and f2 are floating-point variables; F3, F4 and F5 denote expressions with floating-point types.
f1 = F3 + (F4 + F5);
f2 = (F3 + F4) + F5;
The addition operations are, or at least appear to be, performed in the order determined by the position of the parentheses, i.e. first F4 is added to F5 then secondly F3 is added to give the value of
f1. Provided that F3, F4 and F5 contain no side-effects, their values are independent of the order in which they are evaluated. However, the values assigned to f1 and f2 are not guaranteed to be the same because floating-point rounding following the addition operations will depend on the values being added.
3. gcc本⾝对于这种违反序列点的表达式努⼒的给出了warning,使⽤-Wsequence-point, -Wall会给出这个警告。
-Wsequence-point
Warn about code that may have undefined semantics because of violations of sequence point rules in the C standard. The C standard defines the order in which expressions in a C program are evaluated in terms of sequence points, which represent a partial ordering between the execution of parts of the program: those executed before the sequence point, and those executed after it. These occur after the evaluation of a full expression_r(one which is not part of a larger expression), after
the evaluation of the first operand of a &&, ||, ? : or , (comma) operator, before a function is called (but after the evaluation of its arguments and the expression denoting the called function), and in certain other places. Other than as expressed by the sequence point rules, the order of evaluation of
subexpressions of an expression is not specified. All these rules describe only a partial order rather than a total order, since, for example, if two functions are called within one expression with no sequence point between them, the order in which the functions are called is not specified. However, the standards committee have ruled that function calls do not overlap. It is not specified when between sequence points modifications to the values of objects take effect. Programs whose behavior depends on this have undefined behavior; the C standard specifies that “Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.”. If a program breaks these rules, the results on any particular implementation are entirely unpredictable.
Examples of code with undefined behavior are a = a++;, a[n] = b[n++] and a[i++] = i;. Some more complicated cases are not diagnosed by this option, and it may give an occasional false positive result, but in general it has been found fairly effective at detecting this sort of problem in programs. The present implementation of this option only works for C programs. A future implementation may also work for C++ programs. The C standard is worded confusingly, therefore there is some debate over the precise meaning of the sequence point rules in subtle cases. Links to discussions of the pro
blem, including proposed formal definitions, may be found on the GCC readings page, at /readings.html
4. gcc是这样来实现这个check的:
Walk the tree X, and record accesses to variables.  If X is written by the parent tree, WRITER is the parent. We store accesses in one of the two lists: PBEFORE_SP, and PNO_SP.  If this  expression or its only operand forces a sequence point, then everything up to the sequence point is stored in PBEFORE_SP.  Everything else gets stored in PNO_SP.
Once we return, we will have emitted warnings if any subexpression before such a sequence point could be undefined.  On a higher level, however, the sequence point may not be relevant, and we'll merge the two lists.
Example: (b++, a) + b;
The call that processes the COMPOUND_EXPR will store the increment of B in PBEFORE_SP, and the use of A in PNO_SP. The higher-level call that processes the PLUS_EXPR will need to merge the two lists so that eventually, all accesses end up on the same list (and we'll warn about the unordered subexpressions b++ and b.
A note on merging.  If we modify the former example so that our expression becomes
(b++, b) + a
care must be taken not simply to add all three expressions into the final PNO_SP list.  The function merge_tlist takes care of that by merging the before-SP list of the COMPOUND_EXPR into its after-SP list in a special way, so that no more than one access to B is recorded.
5. 但是gcc对于这个warning做的有4个问题:
(1) 对于结构体元素不能给出warning (s->a++ = s->a + 5;), 原因在于它没有把s->a看成⼀个整体的元素,⽽是分解开来做的,不能识别出s->a 是⼀次read,⽽s->a++是⼀次writer
(2)将a[i]分解来看,所以可以check“a[i] + i++”,但是对于“a[i]++ + a[i]”⽆能为⼒。
(3)对于return语句没有verify_sequence_points
(4)对于alias(例如 p = q; *p++ = q++;)⽆法处理,因为前段只是简单的语法树分析,还做不到这⼀点。
我只处理了(1)和(3),(2)本⾝就是⽭盾的,除⾮check 两次,所以我保留了原来的做法。(4)对于前段的check是⽆能为⼒的。 但是这个选项-Wsequence-point 只对C语⾔起作⽤,还没弄清楚,为什么g++对这个不做check。

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。