如何在SQLServer2016中使用R导入导出CSV文件--688IT编程网

如何在SQLServer2016中使⽤R导⼊导出CSV⽂件

介绍 (Introduction)

Importing and exporting CSV files is a common task to DBAs from time to time.

导⼊和导出CSV⽂件是DBA经常执⾏的⼀项常见任务。

For import, we can use the following methods

对于导⼊，我们可以使⽤以下⽅法

with the Bulk option

Writing a CLR stored procedure or using PowerShell

编写CLR存储过程或使⽤PowerShell

For export, we can use the following methods

对于导出，我们可以使⽤以下⽅法

Writing a CLR stored procedure or using PowerShell

编写CLR存储过程或使⽤PowerShell

But to do both import and export inside T-SQL, currently, the only way is via a custom CLR stored procedure.

但是⽬前，要同时在T-SQL中进⾏导⼊和导出，唯⼀的⽅法是通过⾃定义CLR存储过程。

This dilemma is changed since the release of SQL Server 2016, which has R integrated. In this article, we will demonstrate how to use R embedded inside T-SQL to do import / export work.

⾃从集成了RSQL Server 2016版本以来，这⼀难题就得到了改变。在本⽂中，我们将演⽰如何使⽤嵌⼊在T-SQL中的R进⾏导⼊/导出⼯作。

SQL Server 2016中的R集成 (R Integration in SQL Server 2016)

To use R inside SQL Server 2016, we should first install the R Service in-Database. For detailed installation, please see

若要在SQL Server 2016中使⽤R，我们应⾸先在数据库中安装R Service。有关详细的安装，请参阅

T-SQL integrates R via a new stored procedure: .

T-SQL通过新的存储过程集成了R。

The main purpose of R language is for data analysis, especially, in statistics way. However, since any data analysis work naturally needs to deal with external data sources, among which is CSV file, we can use this capability to our advantage.

R语⾔的主要⽬的是⽤于数据分析，尤其是以统计的⽅式。但是，由于任何数据分析⼯作⾃然都需要处理外部数据源，其中包括CSV⽂件，因此我们可以利⽤此功能来发挥我们的优势。

What is more interesting here is SQL Server R service is installed with an enhanced and tailored for SQL Server 2016 R package package, which contains some handy .

此处更有趣的是，SQL Server R服务已安装并经过增强并针对SQL Server 2016 R程序包程序包进⾏了，其中包含⼀些⽅便的。

环境准备 (Environment Preparation)

Let’s first prepare some real-world CSV files, I recommend to download CSV files from .

⾸先，让我们准备⼀些实际的CSV⽂件，我建议从下载CSV⽂件。

We will download the first two dataset CSV files, “College Scorecard” and “Demographic Statistics By Zip Code”, just click the arrow-pointed two links as shown below, and two CSV files will be downloaded.

我们将下载前两个数据集CSV⽂件，即“⼤学记分卡”和“按分类的⼈⼝统计学”，只需单击箭头所⽰的两个链接，如下所⽰，将下载两个CSV⽂件。

After downloading the two files, we can move the “Consumer_complain.csv” and “Most-Recent-Cohorts-Scorecard-Elements.csv” to a designated folder. In my case, I created a folder C:\RData and

put them there as shown below

下载两个⽂件后，我们可以将“ Consumer_complain.csv”和“ Most-Recent-Cohorts-Scorecard-Elements.csv”移动到指定⽂件夹。就我⽽⾔，我创建了⼀个⽂件夹C：\ RData并将其放置在下⾯，如下所⽰

These two files are pretty typical in feature, the Demographic_Statistics_By_Zip_Code.csv are all pure numeric values, and another file has big number of columns, 122 columns to be exact.

这两个⽂件在功能上⾮常典型，Demographic_Statistics_By_Zip_Code.csv都是纯数字值，另⼀个⽂件的列数很多，准确的说是122列。

I will load these two files to my local SQL Server 2016 instance, i.e. [localhost\sql2016] in [TestDB] database.

我将这两个⽂件加载到本地SQL Server 2016实例，即[TestDB]数据库中的[localhost \ sql2016]。

数据导⼊/导出要求 (Data Import / Export Requirements)

We will do the following for this import / export requirements:

我们将针对此导⼊/导出要求执⾏以下操作：

1. Import the two csv files into staging tables in [TestDB]. Input parameter is a csv file name

将两个csv⽂件导⼊[TestDB]中的登台表。输⼊参数是⼀个csv⽂件名

2. Export the staging tables back to a csv file. Input parameters are staging table name and the csv file name

将登台表导出回csv⽂件。输⼊参数是登台表名称和csv⽂件名称

3. Import / Export should be done inside T-SQL

导⼊/导出应在T-SQL内部完成

实施导⼊ (Implementation of Import)

In most of the data loading work, we will first create staging tables and then start to load. However, with some amazing functions in RevoScaleR package, this staging table creation step can be omitted as the R function will auto create the staging table, it is such a relief when we have to handle a CSV file with 100+ columns.

在⼤多数数据加载⼯作中，我们将⾸先创建临时表，然后开始加载。但是，由于RevoScaleR软件包中有⼀些了不起的功能，因此可以省略此登台表创建步骤，因为R函数将⾃动创建登台表，当我们必须处理包含100列以上的CSV⽂件时，这是⼀种缓解。

The implementation is straight-forward

实现简单明了

1. Read csv file with read.csv R function into variable c, which will be the source (line 7)

使⽤read.csv R函数将csv⽂件读⼊变量c，这将成为源（第7⾏）

2. From the csv file full path, we extract the file name (without directory and suffix), we will use this file name as the

staging table name (line 8, 9)

从csv⽂件的完整路径中，我们提取⽂件名（不带⽬录和后缀），我们将使⽤此⽂件名作为登台表名（第8、9⾏）

3. Create a sql server connection string

创建⼀个SQL Server连接字符串

4. Create a destination SQL Server data source using RxSQLServerData function (line 12)

使⽤RxSQLServerData函数创建⽬标SQL Server数据源（第12⾏）

5. Using RxDataStep function to import the source into the destination (line 13)

使⽤RxDataStep函数将源导⼊到⽬标（第13⾏）

6. If we want to import a different csv file, we just need to change the first line to assign the proper value to @filepath

如果要导⼊其他的csv⽂件，则只需更改第⼀⾏即可为@filepath分配适当的值

One special notd here, line 11 defines a connection string, at this moment, it seems we need a User ID

(UID) and Password (PWD) to avoid problems. If we use Trusted_Connection = True, there can be problems. So in this case, I created a login XYZ and assign it as a db_owner user in [TestDB].

此处需要特别注意的是，第11⾏定义了⼀个连接字符串，此刻，我们似乎需要⼀个⽤户ID（UID）和密码（PWD）以避免问题。如果我们使⽤Trusted_Connection = True，则可能会出现问题。因此，在这种情况下，我创建了登录XYZ并将其分配为[TestDB]中的db_owner ⽤户。

After this done, we can check what the new staging table looks like

完成此操作后，我们可以检查新登台表的外观

We notice that all columns are created using the original names in the source csv file with the proper data type.

我们注意到，所有列都是使⽤具有正确数据类型的源csv⽂件中的原始名称创建的。

After assigning @filepath = ‘c:/rdata/Most-Recent-Cohorts-Scorecard-Elements.csv’ , and re-running the script, we can check to see a new table [Most-Recent-Cohorts-Scorecard-Elements] is created with 122 columns as shown below.

分配@filepath = 'c：/rdata/Most-Recent-Cohorts-Scorecard-Elements.csv'并重新运⾏脚本后，我们可以检查以查看新表[Most-Recent-Cohorts-Scorecard-Elements]创建有122列，如下所⽰。

However, there is a problem for this csv file import because some csv columns are treated as integers, for example, when for [OPEID] and [OPEID6], they should be treated as a string instead because treating them as integers will drop the leading 0s.

但是，此csv⽂件导⼊存在问题，因为某些csv列被视为整数，例如，对于[OPEID]和[OPEID6]，应将它们视为字符串，因为将它们视为整数会删除前导0s。

When we see what is inside the table, we will notice that in such scenario, we cannot rely on the table auto creation.

当我们看到表中的内容时，我们会注意到在这种情况下，我们不能依赖表的⾃动创建。

To correct this, we have to give the instruction to R read.csv function by specifying the data type for the two columns as shown below

为了解决这个问题，我们必须通过指定两列的数据类型来向R read.csv函数提供指令，如下所⽰

We can now see the correct values for [OPEID] and [OPEID6] columns

现在，我们可以看到[OPEID]和[OPEID6]列的正确值

实施出⼝ (Implementation of Export)

If we want to dump the data out of a table to csv file. We need to define two input parameters, one is the destination csv file path and another is a query to select the table.

如果我们想将数据从表中转储到csv⽂件中。我们需要定义两个输⼊参数，⼀个是⽬标csv⽂件路径，另⼀个是选择表的查询。

The beautify of sp_execute_external_script is it can perform a query against table inside SQL Server via its @input_data_1 parameter, and then transfer the result to the R script as a named variable via its @input_data_1_name.

美化sp_execute_external_script是因为它可以通过其@ input_data_1参数对SQL Server中的表执⾏查询，然后通过其@

input_data_1_name将结果作为命名变量传输到R脚本。

server 2016So here is the details:

因此，这是详细信息：

Define the csv file full path (line 3), this information will be consumed by the embedded R script via an input parameter definition (line 11 & 12 and consumed in line 8)

定义csv⽂件的完整路径（第3⾏），该信息将由嵌⼊式R脚本通过输⼊参数定义使⽤（第11和12⾏，在第8⾏中使⽤）

Define a query to retrieve data inside table (line 4 and line 9)

定义查询以检索表内部的数据（第4⾏和第9⾏）

SrcTable, and it Is consumed in the embedded R script (line 8) SrcTable，并在嵌⼊式R脚本中使⽤（第8⾏）

to generate the csv file. ⽣成csv⽂件。

We can modify @query to export whatever we want, such as a query with where clause, or just select some columns instead of all columns.

我们可以修改@query以导出所需的任何内容，例如带有where⼦句的查询，或者只选择某些列⽽不是所有列。

The complete T-SQL script is shown here:

完整的T-SQL脚本如下所⽰：

-- import data 1: import from csv file by using default configurations

-- the only input parameter needed is the full path of the source csv file

declare @filepath varchar(100) = 'c:/rdata/Demographic_Statistics_By_Zip_Code.csv' -- using / to replace \

declare @tblname varchar(100);

declare @svr varchar(100) = @@servername;

exec sp_execute_external_script @language = N'R'

, @script = N'

c <- read.csv(filepath, sep = ",", header = T)

filename <- basename(filepath)

filename <- paste("dbo.[", substr(filename,1, nchar(filename)-4), "]", sep="") #remove .csv suffix

688IT编程网

如何在SQLServer2016中使用R导入导出CSV文件

发表评论

推荐文章

mongodb中match多个条件

纯数字正则表达式

zipkin tagquery用法

excel匹配正则 -回复

re正则匹配之findall

热门文章

js 数值型验证正则

oracle模糊查询正则

符合ca91的社会信用代码的正则表达式

C#中使用正则表达式校验输入的是否为英文字母【转载自】

Java正则表达式验证至少6位表达式中至少包含数字大小写字母中的一种

强密码校验正则

hive正则表达式解析

p开头的正则表达式

思源笔记正则表达

用正则表达式限制文本框只能输入数字,小数点,英文字母,汉字等各类代 ...

Powerquery分离数字字母汉字

php+正则将字符串中的字母数字和中文分割

前端密码的正则表达式

vue 正则表达式 function 开头中文字母数字 (结尾

el-input 英文名称的正则

32个字符正则

四位英文和数字正则

字母正则匹配中文规则

8-14位字母、数字或符号组合正则

长度不小于4的正则表达式

最新文章

纯数字正则表达式

zipkin tagquery用法

1-4096的整数正则表达式

正则10-360之间的整数

验证整数的正则表达式

正则匹配整数

标签列表

688IT编程网

如何在SQLServer2016中使用R导入导出CSV文件

发表评论

推荐文章

mongodb中match多个条件

纯数字正则表达式

zipkin tagquery用法

excel匹配正则 -回复

re正则匹配之findall

热门文章

js 数值型 验证 正则

oracle模糊查询正则

符合ca91的社会信用代码的正则表达式

C#中使用正则表达式校验输入的是否为英文字母【转载自】

Java正则表达式验证至少6位表达式中至少包含数字大小写字母中的一种

强密码校验正则

hive正则表达式解析

p开头的正则表达式

思源笔记正则表达

用正则表达式限制文本框只能输入数字,小数点,英文字母,汉字等各类代 ...

Powerquery分离数字字母汉字

php+正则将字符串中的字母数字和中文分割

前端密码的正则表达式

vue 正则表达式 function 开头 中文字母数字 (结尾

el-input 英文名称的正则

32个字符正则

四位英文和数字 正则

字母正则匹配中文规则

8-14位字母、数字或符号组合正则

长度不小于4的正则表达式

最新文章

纯数字正则表达式

zipkin tagquery用法

1-4096的整数正则表达式

正则10-360之间的整数

验证整数的正则表达式

正则匹配整数

标签列表

js 数值型验证正则

vue 正则表达式 function 开头中文字母数字 (结尾

四位英文和数字正则