Golang implement of MapReduce

This is our EE447 final project, idea comes from MIT 6.824 course project. Contributors are @sun-lingyu, @yifanlu0227,@Nicholas0228

Required package & how to install

  • golang 1.15+

  • crypto/ssh : go get golang.org/x/crypto/[email protected]

  • python3-dev : sudo apt-get install python3-dev -y

  • python2-dev : sudo apt-get install python2.7-dev -y

Optional package & how to install

  • nltk (for word count & inverted index example ): pip3 install nltk -y ; pip2 install nltk==3.0.0 -y
  • numpy (for KNN example): pip install numpy

Usage

First run git clone https://github.com/yifanlu0227/mapreduce.git to download this resposity to your machines. Select one machine to be the coordinator, and others to be workers.

You should edit your worker’s ip / username / password in mapreduce/src/main/mrcoordinator.go like following.

	hosts := []string{"192.168.0.132", "192.168.0.184", "192.168.0.33", "192.168.0.199"}
	command := "go run mrworker.go " + os.Args[1]
	mr.AwakenWorkers("root", "Ydhlw123", hosts, command)

And you should make sure the 1234 port and 8081 port are available, since we will use them for our RPC and http server.

Python support

Our MapReduce support python development, i.e., you can just provide a simple python file including map function and reduce function. You can refer to our provide example like word count mapreduce/src/main/wc.py .

def map(name, contents):
	lower = contents.upper()
	remove =  string.maketrans(string.punctuation, string.punctuation,) 
	lower1 = lower.translate(remove, string.punctuation,)
	without_punctuation = lower1.translate(remove, string.digits,)
	tokens = nltk.word_tokenize(without_punctuation)
	kva = []
	for p in tokens:
		lisdict = {}
		lisdict[p] = "1"
		kva.append(lisdict)
	return kva
	
def reduce(key, values):
	return str(len(values))

To run the this word count example with input file pg-*.txt , run this in terminal

go run mrcoordinator.go wc pg-*.txt

The KNN example

go run mrcoordinator.go knn dataset*.txt

The Inverted Index example

go run mrcoordinator.go inverted_index pg-*.txt

To see the output file, run

cat mr-out-* | sort | more

Experiment

word count:

wordcount


inverted index:

invertedindex


KNN large dataset:

wordcount

Visualization

worker perspective

worker


file perspective

file

file

Acknowledge

MIT 6.824

GitHub

https://github.com/yifanlu0227/mapreduce