基于特定角上下文的多智能体Q学习--688IT编程网

Vol.33,No.6ACTA AUTOMATICA SINICA June,2007

Role-based Context-speciﬁc Multiagent Q-learning

JIANG Da-Wei1WANG Shi-Yuan1DONG Yi-Sheng1

Abstract One of the main problems in cooperative multiagent learning is that the joint action space grows exponentially with the number of agents.In this paper,we investigate a sparse representation of the coordination dependencies between agents to employ roles and context-speciﬁc coordination graphs to reduce the joint action space.In our framework,the global joint Q-function is decomposed into a number of local Q-functions.Each local Q-function is shared among a small group of agents and is composed of a set of value rules.We propose a novel multiagent Q-learning algorithm which learns the weights in each value rule automatically. We give empirical evidence to show that our learning algorithm converges to the same optimal policy with a signiﬁcantly faster speed than traditional multiagent learning techniques.

Key words Multiagent Q-learning,multiagent coordination,role,context-speciﬁc coordination graph

1Introduction

A multiagent system(MAS)is a group of agents co-existing in an environment that interact with each other to optimize a performance measure[1,2].Research in MASs focuses on the agents behavior management issues.In this paper,we concentrate on the fully cooperative MASs in which all the agents share a common goal.Scenarios are a team of robots who play football against another team or a group of robots who plan to build a house.A key aspect in such systems is coordination:the process to ensure that the individual actions of the agents generate optimal joint decisions for the whole group.

Reinforcement learning(RL)[3,4]techniques have been widely used in many single-agent domains to learn the opti-mal policy of an agent in uncertain environments.However, directly porting such techniques to multiagent settings is not straightforward[5,6].In principle,it is possible to treat a MAS as a single‘big’agent and learn the optimal joint pol-icy using standard reinforcement learning techniques such as Q-learning[6,7].Unfortunately,the joint action space grows exponentially with the number of agents.Therefore in even small settings,these techniques are infeasible.On the other hand,we can let each agent learn his policy in-dependent of the other agents[6].Using this approach,the learning agent only considers his own actions without the knowledge of actions of other agents.However,the con-vergence condition of RL does not hold anymore since the transition model

of the learning agent depends on the pol-icy of other learning agents[8].The learning process will oscillate.

A recent work to decrease the size of the joint action space uses a context-speciﬁc coordination graph(CG)[9∼12]. The idea behind CG is that in many situations,only a small number of agents need to coordinate their actions while the rest can act individually.For example,in robotic soccer, only the ball owner and his surrounding players need to co-ordinate their actions to perform a pass while other robots can take their individual decisions.Therefore the global joint Q-function,the representation of the global joint co-ordination dependencies between all agents,can be approx-imated as a linear combination of local terms.Each term represents local coordination dependencies between a small subgroup of the agents[9].Unfortunately,it is diﬃcult to apply original rule-based CG to dynamic domains.Fur-thermore,until now,only a planning algorithm has been proposed for the weights determination in each value rule Received May18,2006;in revised form August11,2006 Supported by the Natural Science Foundation of Jiangsu Province of P.R.China(BG2004034)

1.School of Computer Science and Engineering,Southeast Univer-sity,Nanjing210096,P.R.China

DOI:10.1360/aas-007-0583of CG[9,10].Such an algorithm assumes the complete model of the environment to be a prior knowledge,which is often unavailable in practical MASs.

In this work,we propose a formal framework and a novel multiagent Q-learning algorithm to scale up multiagent learning to large,complex systems by use of a problem-speciﬁc coordination structure.Our framework is built on CG.To conquer the drawback of original CG in dy-namic domains,we use role[13]to discretize the continuous states.In oﬀ-line design stage,instead of deﬁning value rules for each agent,we deﬁne value rules for each role.In on-line assignment stage,we use role assignment algorithm to assign a role to an agent,and each agent gets his corre-sponding value rules associated with that role.Thereafter, the context-speciﬁc coordination graph is established.To learn the weights in CG,we propose a novel multiagent Q-learning algorithm.The algorithm approximates the global joint Q-function with a sum of local Q-functions.Each lo-cal Q-function is shared among a small number of agents and is the sum of value rules that describe the coordina-tion dependencies among them.Then we update weights in each value rule by use of an updating rule derived from the standard Q-learning.We call our approach role-based context-speciﬁc Q-learning(RQ).Our approach only as-sumes the coordination structure of the system is known beforehand.It does not need to know the complete model of the environment and can be easily applied in both dis-crete and dynamic domains.We give empirical evidence to show our learning algorithm converges to the same op-timal policy produced by traditional multiagent learning techniques with a signiﬁcantly faster learning speed.

We make the following contributions:

•To our knowledge,we are theﬁrst to combine role and context-speciﬁc CG to represent the coordination de-pendencies between the agents in MASs.The original CG can be regarded as a special case of our framework with only one role in the system and the unique role holding all value rules.Therefore,our framework is feasible in both discrete and dynamic domains.

•We propose a novel multiagent Q-learning algorithm to learn the weights in each value rule.The algorithm does not assume the complete model of the environ-ment as a prior knowledge.Thus,it is more suitable for real-world MASs than the planning algorithm.

•Experiments show that our algorithm can learn the same optimal policy with a signiﬁcantly faster speed than traditional multiagent RL.

The paper is organized as follows.In Section2,we use collaborative multiagent Markov decision process(CM-MDP)framework to formalize the multiagent cooperative learning problem and review diﬀerent solutions.Then we

584ACTA AUTOMATICA SINICA Vol.33

describe our proposed framework and learning algorithm in Section 3.Section 4experimentally validates the correct-ness and eﬃciency of our algorithm.Section 5is conclu-sions.

Collaborative multiagent MDPs and reinforcement learning

This section will discuss several multiagent RL methods.First,we formalize the multiagent coopera-tive learning problems using the CMMDP [12]framework which extends the traditional single agent Markov decision process (MDP)to multiagent cooperative settings.In this paper,we use upper case letters (e.g.,X )to denote random variables and lower case (e.g.,x )to denote their values.We also use boldface to denote vectors of variables (e.g.,X )or their values (x ).

Deﬁnition 1.(Guestrin)A CMMDP Γis a ﬁve-tuple <n,S ,A ,R,T >,where n is the number of agents;S ={S 1,...,S n }is a ﬁnite joint state space;

A =×n

i =1A i is a joint action space of n agents;R =P n i =1R i (s ,a )where R i :S ×A →R is a reward func-tion which returns the reward R i (s ,a )for agent i taking the joint action a in state s ;and T :S ×A ×S

→[0,1]is a Markovian transition function which describes the prob-ability p (s |s ,a )that the system will move from state s to state s after the agents perform the joint action a ∈A .The objective of the agents is to ﬁnd a joint policy π={πi =1...n }(where π:S →A and πi :S →A i )to optimize the sum of expected discounted rewards

Q ∗(s ,a )=max πQ π(s ,a )=

max πE "∞X t =0

γt R (s t ,π(s t ))˛˛s 0=s ,a 0=a

#(1)

for each state s where γ∈[0,1]is a discount factor.

RL [3,4]is successfully used in single agent domains to estimate the Q ∗(s ,a ).One of the most important break-throughs in RL is the development of Q-learning [3,14].Q-learning begins with an initial estimation Q (s ,a )and re-peatedly updates it using the following rule

Q (s ,a )←Q (s ,a )+α[R (s ,a )+γmax a Q (s ,a )−Q (s ,a )]

(2)

where α∈(0,1)is the learning rate.Q-learning converges to the optimal Q ∗(s ,a )under certain conditions [14].How-ever extending single agent RL to multiagent systems is not straightforward [5,7].We review the existing multiagent RL techniques in the following subsections.2.1

Joint action learners

This approach treats the whole MAS as a single ‘big’agent.The state and action sets of the agent are the joint state space and the joint action space of the original MAS,respectively [6].The ‘big’agent uses Q-learning to learn the optimal policy.This technique is called joint action learning (JAL).The drawback of JAL is that both the state set and the action set of the learner grow exponentially with the number of agents.Therefore,JAL is infeasible in even small settings [6].2.2

Independent learners

On the other hand,we can let each agent learn the pol-icy independently without the knowledge of the actions and rewards of other agents [6,7].This technique is called inde-pendent learners (IL).Although the IL learner does not

need to exhaust the exponential joint action space ,the convergence proof of Q-learning is not held anymore since the transition model depends on the actions of other learn-ing agents [8].Despite the drawback,IL has been used for some situations [6].

Role-based

context-speciﬁc Q-

learning

This section will describe our proposed multiagent learning method.We treat the whole MAS as a single ‘big’agent like JAL to guarantee the RL convergence condition.The problem is the joint action space grows exponentially with the number of the agents.In general,this problem is intractable [12].However,many complex systems have a “nearly decomposable,hierarchical structure”,with the subsystems interacting only weakly between themselves [16].This is the type of structure that a human decision maker will exploit to solve large-scale problems.We may make the best of problem-speciﬁc structure to scale up multiagent learning algorithm to large,complex systems.To achieve suc

h a goal,we ﬁrst build a formal framework to repre-sent the coordination dependencies (the problem-speciﬁc structure)between the agents.Then we propose a novel multiagent Q-learning algorithm to learn the optimal pol-icy.

3.1Context-speciﬁc coordination graphs and

roles According to [10],the coordination dependencies can be represented by a context-speciﬁc coordination graph (CG).First,we deﬁne a set of value rules.Each rule deﬁnes a context where involved agents perform their joint action to coordinate.The rules conformed to a given state are trans-formed into a CG G =(V,E )with each agent mapping to a node and each coordination dependency to an edge.Only interconnected agents have to coordinate their actions.The global joint Q-function is approximated as the sum of a set of local Q-functions:Q (s ,a )=P n i =1Q i (s ,a ).Each lo-cal Q-function is shared by involved agents Agents[Q i ]={A i ∈A |A i ∈Dom[Q i ]}and is the sum of value rules.Deﬁnition 2.(Guestrin)A value rule ρ=<s ∧a :v >is a function ρ:S ×A →R such that ρ(s ,a )=v when the current world is consistent with state s and the agents perform the joint action a and 0otherwise.

Deﬁnition 3.A local Q-function Q i :S ×A →R is shared among its involved agents Agents[Q i ]={A i ∈A |A i ∈Dom[Q i ]}and is composed of a set of value rules {ρ1,...,ρn }such that

Q i (s ,a )=

n X j =1

ρj (s ,a )(3)

where Agents[ρj ]∩Agents[Q i ]=∅and ρj is consistent with the current state s .

If the weights in all value rules are known,the op-timal joint action of the agents a ∗=argmax a Q (s ,a )is computed by variable elimination (VE)algorithm eﬃciently [10,11].The drawback of original rule-based CG is that it is diﬃcult to be used in dynamic environments.To tackle that problem,we introduce role.Our role deﬁni-tion is similar to [13]but adding associated value rules.Deﬁnition 4.A role is a tuple <m,P m ,r i,m >,where m ∈M is the role s identity;P m is a set of value rules associated with the role m ;r i,m :r (i,m )→[0,1]is a potential function which determines how appropriate the agent i is for the role m in the current world state.

No.6JIANG Da-Wei et al.:Role-based Context-speciﬁc Multiagent Q-learning585 We incorporate role into CG in two stages–an oﬀ-line

design stage and an on-line assignment stage.In oﬀ-line

design stage,instead of deﬁning value rules for each agent,

we deﬁne value rules for each role.In on-line assignment

stage,we use a role assignment algorithm to assign a role

to an agent.Thereafter,the agent can get corresponding

value rules from that role.

The role assignment algorithm works as follows.Suppose

we have n agents.The role assignment algorithm deﬁnes

a sequence M of roles where|M|≥n.The sequence is

ordered with respect to the importance of the role:the

most‘important’role is assigned to an agentﬁrst,followed

by the second most important role,etc.The same role

can be assigned to more than one agent,but each agent

is assigned only one role.The algorithm is performed by

the agents in parallel:each agent computes the potential

r i,m for the agent i and the role m∈M.The role m is

then assigned to the agent that has the highest potential.

The role assignment algorithm runs in time of polynomial

in the numbers of agents and roles.Each agent calculates

O(|M|·n)potentials.

3.2Q-learning in context-speciﬁc coordination

graphs

Thus far,we have built a formal framework to represent

the coordination dependencies between agents.To deter-

mine weights,we propose a multiagent Q-learning algo-

rithm.First,we introduce the local Q-value which denotes

the individual contributions of a single agent to the system.

Deﬁnition5.Q i(s,a)is a local Q-value for agent i:

Q i(s,a)=

j ρi j(s,a)

(4)

whereρi j(s,a)is the value rule involving agent i and n j is the number of agents involved in the rule(including agent i).

To derive the updating rule,we regard the weights in each value rule as the parameters of global Q-function and follow the principle of weights adjustment of function ap-proximation in standard RL[3].The only diﬀerence is that our Q-function is not diﬀerentiable.

Lemma1.The global joint Q-value is the sum of all local Q-values:

Q(s,a)=

Q i(s,a)(5) Proof.

X i Q i(s,a)=

ρi j(s,a)

=Q(s,a)

Lemma2.In each updating step,the local Q-value can be updated according to the following rule.

Q i(s,a)←Q i(s,a)+αh

R i(s,a)+γQ i(s ,a∗)−Q i(s,a)

(6)

where a∗=argmax

a Q(s ,a).

Proof.We treat the MAS as a single agent.The Q-learning updating rule is rewritten as follows.

Q(s,a)←Q(s,a)+α"

i=1

R i(s,a)+γQ(s ,a∗)−Q(s,a)

(7)

According to Lemma1,we replace Q(s,a)with

i=1

Q i(s,a).

i=1

Q i(s,a)←

i=1

Q i(s,a)+

i=1

R i(s,a)+γ

i=1

Q i(s ,a∗)−

i=1

Q i(s,a)

(8)

Q i(s,a)←Q i(s,a)+α

R i(s,a)+γQ i(s ,a∗)−Q i(s,a)

(9)

We use∆Q i(s,a)=[R i(s,a)+γQ i(s ,a∗)−Q i(s,a)]

to denote the increments in(6).

Theorem 1.The value ruleρj(s,a)can be updated

according to the following rule.

ρj(s,a)←ρj(s,a)+α

n j

i=1

∆Q i(s,a)

n i

(10)

where n j is the number of involved agents inρj and n i is the

number of occurrences of agent i in the instantiated value

rules which are consistent with state s and joint action a.

Proof.According to the updating rule for global Q-

function,we have

ρj(s,a)←

ρj(s,a)+α

i=1

∆Q i(s,a)

∆Q i(s,a)=n i·

∆Q i(s,a)

n i

k=1

∆Q i(s,a)

Therefore,we get

ρj(s,a)←

ρj(s,a)+α

i=1

n i

k=1

∆Q i(s,a)

n i

(11)

Note that n i is the number of occurrences of agent i in the

value rules that are consistent with state action pair(s,a).

Thus,

i=1

n i

k=1

∆Q i(s,a)

n i

n j

i=1

∆Q i(s,a)

n i

(12)

where n j is the number of involved agents in value rule

ρj(s,a).(11)now reads:

ρj(s,a)←

ρj(s,a)+α

n j

i=1

∆Q i(s,a)

n i

(13)

ρj(s,a)←ρj(s,a)+α

n j

i=1

∆Q i(s,a)

n i

(14)

An example will make things clear.Suppose we have

following rules for state s and state s :

<ρ1;s∧a1:v1><ρ5;s ∧¯a1:v5>

<ρ2;s∧a1∧a2:v2><ρ6;s ∧a1∧¯a2:v6>

<ρ3;s∧a2∧¯a3:v3><ρ7;s ∧a2∧¯a3:v7>

<ρ4;s∧a2∧a3:v4><ρ8;s ∧¯a2∧¯a3:v8>

586ACTA AUTOMATICA SINICA Vol.33 Furthermore,assume the agents perform the joint action

a={a1,a2,a3}in state s and move to state s .The

optimal joint action produced by VE in state s is a∗=

{¯a1,a2,¯a3}.So the rulesρ1,ρ2,andρ4apply in state s

and the rulesρ5andρ7apply in state s .

We updateρ1,ρ2,andρ4as follows.

∆Q1(s,a)=R1(s,a)+γv5

−

h v

1+v2

∆Q2(s,a)=R2(s,a)+γv7

−

h v

∆Q3(s,a)=R3(s,a)+γv7

−

ρ1(s,a)←v1+α∆Q1(s,a)

ρ2(s,a)←v2+α»

∆Q1(s,a)

∆Q2(s,a)

–

ρ3(s,a)←v3+α»

∆Q2(s,a)

∆Q3(s,a)

–

The complete learning algorithm is described in Algo-rithm1.The algorithm uses role assignment algorithm to assign roles to the agents and VE to determine the optimal joint action.

Algorithm1Role-based context-speciﬁc Q-learning(RQ) Deﬁne:ρ={ρ1,...,ρn}is the set of rules

Initialize eachρi(s,a)∈ρarbitrarily

T←0

repeat{for each episode}

Observe the current state s

P←∅

repeat{for each step of episode}

for each agent i do

Assign role m to agent i

Obtain value rules P m from role m

P←P∪P m

end for

Select joint action a following -greedy method

Perform joint action a and observe the next state s

for eachρj∈P do

ρj(s,a)←ρj(s,a)+αP n j

i=1

∆Q i(s,a)

n i

end for

s←s

until s is terminal

T←T+1

until T>T max

Using our approach,the convergence condition of Q-learning is held[6].Unfortunately,it is diﬃcult to provide formal proof that the learned policy is optimal[3].However, if the coordination dependencies is suﬃciently captured, the diﬀerences will be very small.

4Experimentscooperative

In this section,we compare our learning algorithm with other multiagent RL techniques,especially the

JAL algo-rithm and IL algorithm.We apply the three algorithms to the famous Pursuit domain[17]where the goal of preda-tors is to capture the prey as fast as possible.We set up a problem where two predators try to capture one prey in a10×10grid world.Fig.1shows an example.In the beginning of each episode,the predators and prey are ran-domly placed at the corners of the world.They can move to their adjacent cells or hold on their current , dir∈D={center,n,w,s,e}where moving center stands for holding on their current positions.The prey is captured when both predators stands on his adjacent cells and only one predator move to the prey s location.The policy of the prey isﬁxed.It stands on its current position with a probability of0.2otherwise randomly moves to one of his free adjacent

cells.

Fig.1Pursuit domain example,the circles are the

predators and the triangle is the prey

For all three algorithms,the learning rateα=0.3,dis-counted factorγ=0.9,and the exploration =0.1.We generate99×98×52=242,550and99×98×5=48,510 state-action pairs for JAL and IL respectively.All the pro-grams are implemented in C++on an IBM notebook com-puter.

When applying RQ algorithm,we introduce two roles:a capturer role who tries to capture the prey,and a supporter role who stands on his current position when the capturer tries to move to the prey s location,otherwise,moves to support the capturer.The role assignment sequence is M={capturer,supporter}.The potential for the role capturer is based on the Manhattan distance d i,p between predator i and the prey:

r i,capturer=

i,p

where d i,p is the distance between predator i and the prey(15)

The potential r i,supporter is a constant such that the remain predator is assigned to that role.

The following is an example of our generated value rules.

<ρcapturer

;a i=moveTo(dir):100>

<ρcapturer

;has-role-supporter(j)∧

is-adjacent-to-prey(j)∧

a i=moveToPrey()∧

a j=moveTo(center):100>

The value ruleρ1denotes that the capturer role should try to capture the prey even without the support of the other predator.The ruleρ2denotes a coordinated context where the capturer moves to location of the prey and the supporter holds on his current position.Totally,we create 124value rules which instantiate5,483value rules.The

No.6JIANG Da-Wei et al.:Role-based Context-speciﬁc Multiagent Q-learning587 reward that each predator i receives is deﬁned as follows.

R i(s,a)=8

50.0if i capture the prey with support

of the other predator

−50.0if i collide with other predator

−10.0if i moving to the prey without

−1.0support otherwise

(16)

Fig.2shows the capture times for the learned policy dur-ing theﬁrst400,000episodes for the three algorithms.From the plot,we can clearly see that both RQ and IL learn very quickly at the beginning with respect to the JAL, since the RQ and IL have fewer state-actions pairs than the JAL.However,the IL can not converge and keeps os-cillating on the average capture time of17.02.After about 50,000episodes,our algorithm converges to the stable pol-icy with average capture time of about12.92which is very similar to the average capture times of12.78learned by JAL.

To reach such results,JAL needs to learn from more than200,000episodes.The reason for the little diﬀerence is because we have not enumerated all possible coordinated value rules for

RL.

Fig.2Learning results of the learned policy for the

three methods

5Conclusion

In this paper,we have proposed a role-based context-speciﬁc multiagent Q-learning technique.First,w

e use roles and context-speciﬁc coordination graphs to build a formal framework for a sparse representation of the coor-dination dependencies between the agents.Then we have proposed a novel Q-learning algorithm to learn the weights in each value rule automatically.The experiments show that our approach converges to the same policy learned by traditional multiagents RLs with a signiﬁcantly faster learning speed.

References

1Weiss G.Multiagent Systems:A Modern Approach to Distributed Artiﬁcial Intelligence.USA:MIT Press,1999.

26∼38

2Woolridge M,Wooldridge M J.Introduction to Multiagent Systems.USA:John Wiley&Sons,2001.10∼13

3Sutton R S,Barto A G.Reinforcement Learning:An Intro-duction.USA:MIT Press,1998.120∼250

4Kaelbling L P,Littman M,Moore A.Reinforcement learning:

a survey.Journal of Artiﬁcial Intelligence Research,1996,4:

237∼285

5Claus C,Boutilier C.The dynamics of reinforcement learn-ing in cooperative multiagent systems.In:Proceedings of the Fifteenth National Conference on Artiﬁcial Intelligence.

Wisconsin,USA,AAAI Press,1998.746∼752

6Tan M.Multi-agent reinforcement learning:perative learning.Readings in Agents.USA:Morgan Kaufmann,1997,487∼494

7Boutilier C.Planning,learning and coordination in multia-gent decision processes.In:Proceedings of the6th Confer-ence on Theoretical Aspects of Rationality and Knowledge.

San Francisco,USA,Morgan Kaufmann,1996.195∼210

8Watkins C J C H,Dayan P.Technical note:Q-learning.

Machine Learning,1992,3(8):279∼292

9Guestrin C,Koller D,Parr R.Multiagent planning with fac-tored MDPs.In:Proceedings of the14th Neural Informa-tion Processing Systems.Cambridge,USA,MIT Press,2001.

1073∼1080

10Guestrin C,Venkataraman S,Koller D.Context speciﬁc mul-tiagent coordination and planning with factored MDPs.In: Proceedings of the Eighteenth National Conference on Arti-ﬁcial Intelligence(AAAI-2002).Edmonton,Canada,AAAI Press,2002.253∼259

11Guestrin C,Koller D,Parr R,Venkataraman S.Eﬃcient solution algorithms for factored MDPs.Journal of Artiﬁcial Intelligence Research,2003,19:399∼468

12Guestrin C.Planning under Uncertainty in Complex Struc-tured Environments[Ph.D.dissertation],Stanford Univer-sity,2004

13Castelpietra C,Iocchi L,Nardi D,Piaggio M,Scalzo A,Sgor-bissa A.Communication and coordination among heteroge-neous mid-size players:ART99.Lecture Notes in Computer Science,2001,2019:86∼95

14Watkins C J C H.Learning from Delayed Rewards[Ph.D.

dissertation],Cambridge University,1989

15Sen S,Sekaran M,Hale J.Learning to coordinate without sharing information.In:Proceedings of the12th National Conference on Artiﬁcial Intelligence.Seattle,USA,AAAI Press,1994.426∼431

16Simon H.The Architecture of Complexity.USA:MIT Press, 1981.78∼80

17Kok J R,Vlassis N.The pursuit domain package[On-line], available:staﬀ.science.uva.nl/∼jellekok/software/

indexen.html,May21,2003

JIANG Da-Wei Ph.D.candidate in

Department of Computer Science and

Technology at Southeast University.His

research interests include grid computing,

distributed artiﬁcial intelligence,and mul-

tiagent system.Corresponding author of

this paper.

E-mail:davidjiang2005@gmail

W ANG Shi-Yuan Master student in

Department of Computer Science and

Technology at Southeast University.Her

research interests include P2P computing

and multiagent system.

E-mail:desiree wsy@yahoo

DONG Yi-Sheng Professor at South-

east University.His research interests in-

clude grid computing,mobile agent,and

multiagent system.

E-mail:ysdong@seu.edu

688IT编程网

基于特定角上下文的多智能体Q学习

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

基于特定角上下文的多智能体Q学习

发表评论

推荐文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

非零金额正则表达式

半小时正则表达式