面板数据和混合数据分析相关总结
面板数据相关总结
如果不愿意看理论部分,请直接跳到第11页,看eviews操作部分。
(一)概念
1.古扎拉蒂认为面板数据就是横截面和时间序列的混合。但是面板数据观测对象是既定的,《计量经济学基础》第四版,英文版第17章,636页。
In Chapter 1 we discussed briefly the types of data that are generally availablefor empirical analysis, namely, time series, cross section, and panel.
In time series data we observe the values of one or more variables over a period of time (e.g., GDP for several quarters or years). In cross-section data,values of one or more variables are collected for several sample units, or entities,at the same point in time (e.g., crime rates for 50 states in the UnitedStates for a given year).
In panel data the same cross-sectional unit (say afamily or a firm or a state) is surveyed over time. In short, panel data have space as well as time dimensions.
There are other names for panel data, such as pooled data (pooling of time series and cross-sectional observations), combination of time series and cross-section data, micropanel data, longitudinal data (a study overtime of a variable or group of subjects), event history analysis (e.g., studyingthe movement over time of subjects through successive states or conditions),cohort analysis (e.g., following the career path of 1965 graduates of a businessschool). Although there are subtle variations, all these names essentially connote movement over time of cross-sectional units.We will therefore use theterm panel data in a generic sense to include one or more of these terms. And we will call regression models based on such data panel data regression models.
2.伍德里奇认为面板数据有别于(独立)横截面和时间序列的混合。但和古扎拉蒂相仿,伍德里奇认为,面板数据中不同时期的观测个体不变。而(独立)横截面和时间序列的混合中,观测个体是随机的,由此引出的方法是可以令截距或斜率变动,来进行独立横截面和时间序列的分析。《计量经济学导论》第2版,英文版第13章,408页。
We will analyze two kinds of data sets in this chapter. An independently pooled cross
section is obtained by sampling randomly from a large population at different points in time (usually, but not necessarily, different years). For instance, in each year, we can draw a random sample on hourly wages, education, experience, and so on, from the population of working people in the United States. Or, in every other year, we draw a random sample on the selling price, square footage, number of bathrooms, and so on, of houses sold in a particular metropolitan area. From a statistical standpoint, these datasets have an important feature: they consist of independently sampled observations. This was also a key aspect in our analysis of cross-sectional data: among other things, it rules out correlation in the error terms for different observations.
An independently pooled cross section differs from a single random sample in that sampling from the population at different points in time likely leads to observations that are not identically distributed. For example, distributions of wages and education have changed over time in most countries. As we will see, this is easy to deal with in practice by allowing t
he intercept in a multiple regression model, and in some cases the slopes, to change over time. We cover such models in Section 13.1. In Section 13.2, we discuss how pooling cross sections over time can be used to evaluate policy changes.
A panel data set, while having both a cross-sectional and a time series dimension, differs in some important respects from an independently pooled cross section. To collect panel data—sometimes called longitudinal data—we follow (or attempt to follow) the same individuals, families, firms, cities, states, or whatever, across time. For example, a panel data set on individual wages, hours, education, and other factors is collected by randomly selecting people from a population at a given point in time. Then, these same people are reinterviewed at several subsequent points in time. This gives us data on wages, hours, education, and so on, for the same group of people in different years.
3.Eviews (Eviews7 guide 564页)
认为面板数据(panel data)有别于混合时间序列横截面数据(pooled time-series, cross-section data)。长时间,短截面叫混合时间序列截面数据;宽截面,短时间叫做面板数据。
在数据形式上,混合时间序列截面数据按照截面排列;面板数据按照堆积数据(即一个变量的所有数据放在一起,同其他变量的数据分开,)形式排列。
其中:数据存放方式分为非堆积数据(unstacked data)和堆积数据(stacked data)。非堆积数据:给定截面成员、变量的观测值放在一起,但同其他变量、其他截面成员的数据分开。高铁梅,第二版,334页:
堆积数据又分两种:按截面成员堆积或按日期堆积:truncated normal distribution
按截面成员堆积:
按日期堆积:
Generally speaking, we distinguish between the two by noting that pooled time-series, cross-section data refer to data with relatively few cross-sections, where variables are
held in cross-section specific individual series, while panel data correspond to data with large numbers of cross-sections, with variables held in single series in stacked form.
(二)对于平衡面板的定义
1.古扎拉蒂640页
If each cross-sectional unit has the same number of time series observations, then such a panel (data) is called a balanced panel. In the present example we have a balanced panel, as each company in the sample has 20 observations.
If the number of observations differs among panel members, we call such a panel an unbalanced panel. In this chapter we will largely be concerned with a balanced panel.
每个横截面都有相同数目的时间序列数据,则是平衡面板(如全是1979-2008年30年的数据,中间没有缺失);如果各横截面观测数据数目不一致则是非平衡面板(如有的数据只有28次观测值)。
2.伍德里奇
定义同上。
If we have the same T time periods for each of N cross-sectional units, we say that the data set is a balanced panel. 430页
Some panel data sets, especially on individuals or firms, have missing years for at least some cross-sectional units in the sample. In this case, we call the data set an unbalanced panel. 448页
3.eviews7 guide 39页
定义同上。
This entry may be used when you wish to create a balanced structure in which every cross section follows the same regular frequency with the same date observations. Only the barest outlines of the procedure are provided here since a proper discussion

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。