Home / Other / Spark and Hadoop HDFS / PySpark Create a DataFrame
File: test.json
.
> echo '{"name":"martin", "status":"cool"}' > test.json
> echo '{"name":"phil", "status":"dull"}' >> test.json
Copy file into hdfs.
> hdfs dfs -mkdir hdfs://localhost:9000/user/martin/
> hdfs dfs -put test.json hdfs://localhost:9000/user/martin/test.json
Start PySpark
> PySpark
Load file into dataframe.
>>> df = spark.read.json("test.json")
>>> df.show()
+------+------+
| name|status|
+------+------+
|martin| cool|
| phil| dull|
+------+------+
This page was generated by GitHub Pages. Page last modified: 22/11/30 21:22