Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java 如何访问docker中的hdfs? #60

Open
Xiazki opened this issue Aug 1, 2018 · 7 comments
Open

java 如何访问docker中的hdfs? #60

Xiazki opened this issue Aug 1, 2018 · 7 comments

Comments

@Xiazki
Copy link

Xiazki commented Aug 1, 2018

请问 我想通过java来操作hdfs
FileSystem fs = FileSystem.get(new URI("hdfs://172.18.0.2:9000/"), configuration, "root"); System.out.println("begin copy"); fs.copyFromLocalFile(new Path("/Users/xxx/apps/test/test.log"), new Path("/")); System.out.println("done!");
用hadoop上master上的ip 没法在hdfs上创建文件
我仿照脚本加上了一个 0.0.0.0:9000 -> 9000/tcp 冲宿主机上映射到hadoop-master上的9000端口,hdfs://localhost:9000/ 发现虽然能创建文件但size是0
请教一下,谢谢!

@hsipeng
Copy link

hsipeng commented Oct 1, 2018

不要动脚本,脚本里配置好了,

<?xml version="1.0"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop-master:9000/</value>
    </property>
</configuration>

你只要访问 主机地址:9000就行

@xmzDesign
Copy link

同问,我看启动参数只暴露出8088和50070,宿主机上是如何能访问到9000的端口的呢?主机地址:9000试了下 显示连接被拒绝
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)

@JianxunRao
Copy link

同问,windows下,无法通过python的hdfs模块访问HDFS,即便我加了对9000端口的映射
这是start-container.sh

#!/bin/bash

# the default node number is 3
N=${1:-3}


# start hadoop master container
docker rm -f hadoop-master &> /dev/null
echo "start hadoop-master container..."
docker run -itd \
                --net=hadoop \
                -p 50070:50070 \
                -p 8088:8088 \
                -p 9000:9000 \
                --name hadoop-master \
                --hostname hadoop-master \
                kiwenlau/hadoop:1.0 &> /dev/null


# start hadoop slave container
i=1
while [ $i -lt $N ]
do
	docker rm -f hadoop-slave$i &> /dev/null
	echo "start hadoop-slave$i container..."
	docker run -itd \
	                --net=hadoop \
	                --name hadoop-slave$i \
	                --hostname hadoop-slave$i \
	                kiwenlau/hadoop:1.0 &> /dev/null
	i=$(( $i + 1 ))
done 

# get into hadoop master container
docker exec -it hadoop-master bash

这是python测试客户端:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/12/6 18:56
# @Author  : Trojx
# @File    : hdfs_demo.py

from hdfs import InsecureClient
import time

if __name__ == '__main__':
    root_path = "/"
    c = InsecureClient(url="http://localhost:50070", user='root', root=root_path)
    c.makedirs('/user/root/pyhdfs')
    c.write('/user/root/pyhdfs/1.log', time.asctime(time.localtime(time.time())) + '\n', True)
    c.download('/user/root/pyhdfs/1.log', '.', True)
    c.upload('/user/root/pyhdfs/', './pyhdfs_example.py', True)
    hdfs_files = c.list('/user/root/pyhdfs', True)
    for f in hdfs_files:
        print(f)
    print(c.content('/user/root/pyhdfs/pyhdfs_example.py'))
    print(c.checksum('/user/root/pyhdfs/pyhdfs_example.py'))
    c.delete('/user/root/pyhdfs/', True)

这是报错:

D:\PycharmProjects\hadoop-cluster-docker\venv\Scripts\python.exe D:/PycharmProjects/hadoop-cluster-docker/hdfs_demo.py
Traceback (most recent call last):
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\util\connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "C:\Program Files\Python37\lib\socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Program Files\Python37\lib\http\client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files\Python37\lib\http\client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python37\lib\http\client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python37\lib\http\client.py", line 1016, in _send_output
    self.send(msg)
  File "C:\Program Files\Python37\lib\http\client.py", line 956, in send
    self.connect()
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 181, in connect
    conn = self._new_conn()
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\util\retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='hadoop-slave2', port=50075): Max retries exceeded with url: /webhdfs/v1/user/root/pyhdfs/1.log?op=CREATE&user.name=root&namenoderpcaddress=hadoop-master:9000&overwrite=true&user.name=root (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/PycharmProjects/hadoop-cluster-docker/hdfs_demo.py", line 20, in <module>
    c.write('/user/root/pyhdfs/1.log', time.asctime(time.localtime(time.time())) + '\n', True)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 476, in write
    consumer(data)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 468, in consumer
    data=(c.encode(encoding) for c in _data) if encoding else _data,
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 214, in _request
    **kwargs
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='hadoop-slave2', port=50075): Max retries exceeded with url: /webhdfs/v1/user/root/pyhdfs/1.log?op=CREATE&user.name=root&namenoderpcaddress=hadoop-master:9000&overwrite=true&user.name=root (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

Process finished with exit code 1

@wushuaiping
Copy link

同楼上一样的问题

@byrChen
Copy link

byrChen commented Apr 5, 2019

请问这个项目是只能测试wordcount还是能进一步开发,用hadoop完成一些别的工作?

@acse-hy23
Copy link

https://blog.csdn.net/sunrising_hill/article/details/53559398
按照这个修改过后呢?

@chankamlam
Copy link

https://blog.csdn.net/sunrising_hill/article/details/53559398
按照这个修改过后呢?
没用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants