分布式⽂件系统HDFS
⼀.概述及设计⽬标
分布式⽂件系统是为了让⽂件多副本存储,当某个节点瘫痪,在另外的节点可以访问到副本,提⾼系统可靠性。这是传统的设计⽅法。但也存在缺点:
1)不管⽂件多⼤,都存储在⼀个节点上,在进⾏数据处理的时候很难进⾏并⾏处理,节点成为⽹络瓶颈,很难进⾏⼤数据处理;
2)存储负载不均衡,每个节点利⽤率很低
什么是HDFS?
Hadoop实现了⼀个分布式⽂件系统(Hadoop Distributed File System),简称HDFS
源于Google的GFS论⽂
HDFS的设计⽬标
巨⼤的分布式⽂件系统
运⾏在普通廉价的硬件上
易扩展
hadoop分布式集搭建 架构图:
⼀个⽂件会被拆分成多个Block
blocksize:128M
130M==>2个Block:128M 和 2M
NN:
1)负责客户端请求的响应
2)元数据的管理
1)存储⽤户的⽂件对应的数据块(Block)
2)要定期向NN发送⼼跳信息,汇报本⾝及其所有的block信息,健康状况
⼀个典型的部署架构是运⾏⼀个NameNode节点,集⾥每⼀个其他机器运⾏⼀个DataNode节点。实际⽣产环境中建议:NameNode、DataNode部署在不同节点上。
⼆.单机的伪分布式集搭建
环境:Centos7
1.jdk安装
省略
2.安装SSH
sudo yum install ssh
ssh-keygen -t rsa
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
3.安装hadoop
1)官⽹下载,我选择的版本是第三⽅商业化版本cdh,hadoop-2.6.0-cdh5.7.0。
2)解压 tar -zxvf hadoop-2.6.0-cdh5.7. -C ~/app/
4.配置⽂件修改
etc/l:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.56.102:8020</value>
</property>
<property>
<name&p.dir</name>
<value>/root/app/tmp</value>
</property>
</configuration>
<configuration>
<property>
<name&plication</name>
<value>1</value>
</property>
</configuration>
5.启动hdfs
格式化(第⼀次执⾏即可):
cd bin
./hadoop namenode -format
./start-dfs.sh
验证是否成功:
jps
Jps
SecondaryNameNode
DataNode
NameNode
6.停⽌hdfs
cd sbin
./stop-dfs.sh
三.Java API操作HDFS⽂件
IDEA+Maven创建Java⼯程
添加HDFS相关依赖–l
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="/POM/4.0.0"xsi="/2001/XMLSchema-instance"
schemaLocation="/POM/4.0.0 /xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.imooc.hadoop</groupId>
<artifactId>hadoop-train</artifactId>
<version>1.0</version>
<name>hadoop-train</name>
<!-- FIXME change it to the project's website -->
<url>ample</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.6.0-cdh5.7.0</hadoop.version>
</properties>
<repositories>
<repository>
<id>cloudera</id>
<url>repository.cloudera/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) --> <plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<!-- see /ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging --> <plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.7.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.20.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
HDFSApp.java
acker.hadoop.hdfs;
import org.f.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import javax.swing.*;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.URI;
/**
* Hadoop HDFS Java API 操作
*/
public class HDFSApp {
public static final String HDFS_PATH ="hdfs://192.168.56.102:8020";
FileSystem fileSystem = null;
FileSystem fileSystem = null;
Configuration configuration = null;
/**
* 创建HDFS⽬录
* @throws Exception
*/
@Test
public void mkdir()throws Exception {
fileSystem.mkdirs(new Path("/hdfsapi/test"));
}
/**
* 创建⽂件
*/
@Test
public void create()throws Exception {
FSDataOutputStream output = ate(new Path("/hdfsapi/"));
output.write("hello hadoop".getBytes());
output.flush();
output.close();
}
/**
* 查看HDFS⽂件上的内容
*/
@Test
public void cat()throws Exception {
FSDataInputStream in = fileSystem.open(new Path("/hdfsapi/"));
in.close();
}
/**
* 重命名
*/
@Test
public void rename()throws Exception {
Path oldPath =new Path("/hdfsapi/");
Path newPath =new Path("/hdfsapi/");
}
/**
* 上传⽂件到HDFS
*/
@Test
public void copyFromLocalFile()throws Exception {
Path localPath =new Path("/Users/chen/");
Path hdfsPath =new Path("/hdfsapi/test");
}
/**
* 上传⽂件到HDFS
*/
@Test
public void copyFromLocalFileWithProgress()throws Exception {
Path localPath =new Path("/Users/chen/");
Path hdfsPath =new Path("/hdfsapi/test");
InputStream in =new BufferedInputStream(
new FileInputStream(
new File("/Users/chen/Downloads/")));
FSDataOutputStream output = ate(new Path("/hdfsapi/test/hive1."),
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论