sas数据导入终极汇总-之一--688IT编程网

sas数据导⼊终极汇总-之⼀

将数据⽂件读⼊SAS ——DATA Step / PROC IMPORT

1.将SAS⽂件读⼊SAS——

data sasuser.saslin;

set "F:\sas1.sas7bdat";

run;

proc contents data=sasuser.saslin;

run;

2.将其他形式⽂件导⼊成SAS ——PROC IMPORT / 直接读⼊其他形式⽂件

proc import datafile = "c:\data\hsb2.sav" out= work.hsb2;

run;

proc contents data=hsb2;

run;

SAS导⼊数据：SAS recognizes the file type to be imported by file extension.

对数据长度的限制

在⼀些操作环境，SAS假定外部⽂件的纪录对最长为256（⼀⾏数据包括空格等所有字符在内的长度），如果预计读⼊的纪录长度超过256，可在Infile语句中使⽤LRECL=n 这个命令。

读⼊以空格作为分隔符的原始数据

如果原始数据的不同变量之间是以⾄少⼀个空格作为分隔符的，那可以直接采⽤List⽅法将这些数据读⼊SAS。

List Input读数据⾮常⽅便，但也有很多局限性：

（1）不能跳过数据；

（2）所有的缺失值必须以点代替

（3）字符型数据必须是不包含空格的，且长度不能超过8；

（4）不能直接读⼊⽇期型等特殊类型的数据。

程序举例：

INPUT Name $ Age Height;

读⼊按列组织的数据

有些原始数据的变量之间没有空格或其他分隔符，因此这样的⽂件不能以List形式对⼊SAS。但若不同变量值的都在每条记录的固定位置处，则可以按照Column 形式读⼊数据。Colunm读数据⽅法要求所有的数据均为字符型或者标准的数值型（数值中仅包括数字，⼩数点，正负号，或者是E，不包括逗号或⽇期型数据）。

相对于List⽅法，Column读数据⽅法有如下优点：

（1）变量值之间⽆需⽤空格分开；

（2）可以空格表⽰缺失值；

（3）字符型数据中可包括空格；

（4）可跳过数据。

程序举例：

INPUT Name $ 1-10 Age 11-13 Height 14-18;

使⽤格式命令读⼊⾮标准格式的数据

字符型数据： $informat w.

数值型数据： informat w.d

weight的所有形式

⽇期型数据： Datew.

(1)字符型：

$CHARw. ：不删除前后空格，读⼊字符数据；

$HEXw. ：将16进制的数据转化成字符数据；

$w. ：删除前⾯空格，读⼊字符数据；

(2)⽇期，时间或⽇期时间型数据

DATEw. ：以ddmmmyy或ddmmmyyyy形式读⼊⽇期；

DATETIMEw. ：以ddmmmyy hh:mm:ss.ss 形式读⼊⽇期时间；

DDMMYYw. ：以ddmmyy或ddmmyyyy读⼊⽇期；

JULIANw. ：以yyddd或yyyyddd读⼊Julia⽇期；

MMDDYYw. ：以mmddyy或mmddyyyy形式读⼊⽇期；

TIMEw. ：以hh:mm:ss.ss形式读⼊时间；

(3)数值型数据

COMMAw.d ：读⼊数值型数据，将其中的逗号，$ 删除，并将括号转化为负号

HEXw. ：将16进制数据转化成浮点型数据

IBw.d ：读⼊整数⼆进制数据；

PERCENTw. ：将百分数转化为普通数据；

w.d ：读⼊标准的数值型数据。

INPUT Name $16. Age 3. +1 Type $1. +1 Date MMDDYY10.

(Score1 Score2 Score3 Score4 Score5) (4.1);

多种输⼊格式综合

读⼊位置控制——列指针

+n –n ：控制列指针从当前位置向前或向后移动n个字符；

@n ：控制列指针指向

举例：

INPUT ParkName $ 1-22 State $ Year @40 Acreage COMMA9.;

读⼊杂乱数据

在不确定从哪⼀列开始读⼊数据，但知道读⼊的数据均位于某⼀特定字符或字符串之后时，可采⽤@’character’列指针。

如：有字符串如下，需读⼊Breed：后⾯的字符串

My dog Sam Breed: Rottweiler Vet Bills: $478

（1）SAS 语句：Input @’Breed: ’ DogBreed $; 读⼊内容： Rottweil

读⼊Breed：后⾯的字符串，赋给DogBreed，读⼊时到空格时，⾃动结束。

（2）SAS 语句：Input @’Breed:’ DogBreed $20.; 读⼊内容：Rottweiler Vet Bill

读⼊Breed: 后⾯的字符串，赋给DogBreed，读⼊字符串的长度为20。

（3）SAS语句：Input @’Breed:’ DogBreed :$20.; 读⼊内容：Rottweiler

读⼊Breed: 后⾯的字符串，赋给DogBreed，读⼊字符串的长度为20，但遇到空格时不再继续读数据。

从原始数据中读⼊多⾏数据作为SAS的⼀条观测

使⽤⾏指针：

‘ / ’—— 到下⼀⾏读数据

‘#n ’——到第n ⾏读数据

INPUT City $ State $ / NormalHigh NormalLow #3 RecordHigh RecordLow;

从⼀⾏原始数据中读⼊多个观测

在Input语句末尾使⽤@@标⽰，告诉SAS继续读⼊本⾏后⾯的数据。

INPUT City $ State $ NormalRain MeanDaysRain @@;

有选择的读⼊原始数据

SAS让⽤户⽆需读⼊所有的原始数据，然后再判断是否是⽤户所需要的数据。⽤户仅需先读⼊⾜够的变量，以判断该条观测是否为⾃⼰所需，然后在INPUT语句后以@结尾，以使SAS读数据的指针停在此处，保留此⾏数据。然后⽤户使⽤IF语句判断读⼊的观测是否为所需数据，若是，则使⽤第⼆个INPUT语句继续读⼊其余数据。

INPUT Type $ @;

IF Type = ’surface’ THEN DELETE;

INPUT Name $ 9-38 AMTraffic PMTraffic;

@ & @@

(1) 均为锁定数据⾏的标⽰；

(2) @标⽰在SAS进⼊下个循环之前就释放锁定的数据⾏，⽽@@标⽰在继续锁定数据⾏

在INFILE语句中控制输⼊的选项

（1）FIRSTOBS=n : 从n条观测开始读⼊数据

（2）OBS=n 读⼊n条观测

（3）当读进内存的观测长度⼩于INPUT语句设定的长度时

当SAS指针到达⼀条记录的末尾，⽽在INPUT语句中尚有未读⼊的变量时，SAS默认继续读⼊下⼀⾏数据。

MISSOVER：不读⼊下⼀⾏数据，⽽将未赋值的变量以缺失值填充。

TRUNCOVER：当使⽤column或格式化读⼊⽅式时，某些数据⾏长度⼩于其他数据⾏长度时，使⽤TRUNCOVER选项，可防⽌SAS读⼊下⼀⾏数据。

使⽤DATA步读⼊分隔符⽂件

在INFILE语句中使⽤DLM= 选项或者DSD选项可以读⼊以特定符号作为分隔符的原始⽂件。

（1）The DLM= option （i.e. DLM=’&’）

如果是以Tab作为分隔符，则使⽤DLM=’09’X命令

（2）The DSD option：主要有三个功能

忽略单引号内的分隔符；

不将引号作为数据读⼊SAS；

将⼀⾏内连续两个单引号作为⼀个缺失值处理。

使⽤IMPORT程序步读⼊分隔符⽂件

IMPORT 程序的功能

（1）⾃动扫描数据⽂件，并确定变量的类型（数值型或字符型）；

（2）为字符型变量，⾃动设定变量的长度；

（3）识别⼀些⽇期型数据；

（4）将两个连续的分隔符作为⼀个缺失值读⼊SAS

（5）读⼊引号内数据

（6）⾃动将原始数据中不存在的变量赋缺失值；

PROC IMPORT DATAFILE=’filename’ OUT=data-set;

SAS根据读⼊⽂件的扩展名确定⽂件的类型。若读⼊⽂件没有正确的扩展名，或者是DLM⽂件，⽤户必须在IMPORT程序步中使⽤DBMS=option 选项。当读⼊数据集的名称已经存在于SAS库中，可⽤REPLACE选项将原数据覆盖。

PROC IMPORT DATAFILE=’filename’ OUT=data-set DBMS=identifier REPLACE;

在默认情况下，IMPORT程序步将第⼀⾏数据作为变量的名称。若第⼀⾏数据并⾮变量名，可在IMPORT语句后使⽤GETNAMES=NO语句。

若IMPORT程序读⼊的是分隔符⽂件，默认分隔符为空格。若不是，则需使⽤DILIMITER=statement语句指定分隔符。

PROC IMPORT DATAFILE=’filename’ OUT=data-set

DBMS=DLM REPLACE;

GETNAMES=NO;

DELIMITER=’delimiter-character’;

RUN;

使⽤IMPORT程序步读⼊PC⽂件

PROC IMPORT DATAFILE=’filename’ OUT=data-set

DBMS=identifier REPLACE;

列⽰SAS数据集的内容

PROC CONTENTS DATA=data-set;

CONTENTS程序步的功能是显⽰SAS对数据集的具体描述，主要内容有：

（1）数据集描述

数据集的名称；

观测的数量；

变量的数量；

创建⽇期

（2）变量描述

变量类型；

变量长度；

变量的输出格式；

变更的输⼊格式；

变量标识。

实例：

1.读⼊逗号分隔数据：cars_novname.csv

Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189

Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172

Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183

Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186

Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755 ","$39,014 ",3.5,6,225,18,24,3880,115,197

proc import datafile="cars_novname.csv" out=mydata dbms=csv replace;

getnames=no;

run;

proc contents data=mydata;

run;

SAS creates default variable names as VAR1-VARn when variables names are not present in the raw data file.

2.读⼊制表键分隔的数据：

proc import datafile="" out=mydata dbms=tab replace;

getnames=no;

run;

3.根据不同任务将不同的数据集永久保存到对应任务的⽂件夹下：

libname dis "c:\dissertation";

proc import datafile="" data dbms=dlm replace;

delimiter='09'x;

getnames=yes;

run;

3.读⼊空格键分隔的数据：

proc import datafile="" out=mydata dbms=dlm replace;

getnames=no;

run;

4.分隔符的终极例⼦：

Other kinds of delimiters

You can use delimiter= on the infile statement to tell SAS what delimiter you are using to separate vari

ables in your raw data file. For example, below we have a raw data file that uses exclamation points ! to separate the variables in the file.

22!2930!4099

17!3350!4749

22!2640!3799

20!3250!4816

15!4080!7827

The example below shows how to read this file by using delimiter='!' on the infile statement.

DATA cars;

INFILE '' DELIMITER='!' ;

INPUT mpg weight price;

RUN;

PROC PRINT DATA=cars;

RUN;

As you can see in the output below, the data was read properly.

OBS MPG WEIGHT PRICE

1 2

2 2930 4099

2 17 3350 4749

3 22 2640 3799

4 20 3250 4816

5 15 4080 7827

It is possible to use multiple delimiters. The example file below uses either exclamation points or plus signs as delimiters.

22!2930!4099

17+3350+4749

22!2640!3799

20+3250+4816

15+4080!7827

By using delimiter='!+' on the infile statement, SAS will recognize both of these as valid delimiters.

DATA cars;

INFILE '' DELIMITER='!+' ;

INPUT mpg weight price;

RUN;

PROC PRINT DATA=cars;

RUN;

As you can see in the output below, the data was read properly.

OBS MPG WEIGHT PRICE

1 2

2 2930 4099

2 17 3350 4749

3 22 2640 3799

4 20 3250 4816

5 15 4080 7827

import缺陷及注意事项：

Proc import does not know the formats for your variables, but it is able to guess the format based on w

hat the beginning of your dataset looks like. Most of the time, this guess is fine. But if the length of a variable differs from beginning to end of your file, you might end up with some truncated values.

重点语法-Infile options

For more complicated file layouts, refer to the infile options described below.

DLM=

The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file).

DSD

The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option,

SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.

FIRSTOBS=

This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables).

MISSOVER

This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors

throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.

OBS=

Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely.

A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:

INFILE "" DLM=',' DSD MISSOVER FIRSTOBS=2 ;

读⼊有缺失值的数据或者读⼊数值中含有分隔符的数据

DATA cars2;

length make $ 20 ;

INFILE '' DELIMITER=',' DSD ;

INPUT make mpg weight price;

RUN;

PROC PRINT DATA=cars2;

RUN;

48,'Bill Clinton',210

50,'George Bush, Jr.',180

DATA guys2;

length name $ 20 ;

INFILE '' DELIMITER=',' DSD ;

INPUT age name weight ;

RUN;

PROC PRINT DATA=guys2;

RUN;

最经典例⼦：从某⾏开始读⼊数据

DATA cars2;

length nf 8;

INFILE 'F:\cars1.csv' DELIMITER=',' dsd MISSOVER firstobs=2 ;

INPUT nf zh hh xb cs IHA fj;

RUN;

PROC PRINT DATA=cars2;

RUN;

688IT编程网

sas数据导入终极汇总-之一

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

sas数据导入终极汇总-之一

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式