diff --git a/Python Assignment 1/Python Class 1.txt b/Python Assignment 1/Python Class 1.txt new file mode 100644 index 0000000..9527ee6 --- /dev/null +++ b/Python Assignment 1/Python Class 1.txt @@ -0,0 +1,32 @@ +Ques 1: What is JPython & CPython? +Ans: JPython: Jython is the JVM implementation of the Python programming language. + It is designed to run on the Java platform. + A Jython program can import and use any Java class. + Just as Java, Jython program compiles to bytecode. + One of the main advantages is that a user interface designed in Python can use GUI elements of AWT, Swing or SWT Package. + Jython, which started as JPython and was later renamed, follows closely the standard Python implementation called CPython as created by Guido Van Rossum. + CPython: CPython is the reference implementation of the Python programming language. + Written in C and Python, CPython is the default and most widely used implementation of the language. + CPython can be defined as both an interpreter and a compiler as it compiles Python code into bytecode before interpreting it. + It has a foreign function interface with several languages including C, in which one must explicitly write bindings in a language other than Python. + +Ques 2: Basic difference between Python2 and Python3? +Ans: Basis of comparison Python 3 Python 2 + Release Date 2008 2000 + Function print print ("hello") print "hello" + Syntax simpler comparatively difficult to understand + Iteration Range() func. xrange() is used for iterations. + +Ques 3: Difference between ASCII and Unicode? +Ans: ASCII: It stands for American Standart Code for Information Interchange. It uses 8-bit encoding. + ASCII represents 128 characters. + It is stored as 8-bit byte. + ASCII is not standardized. + + Unicode: It is also a character encoding but uses variable bit encoding. + Unicode defones 2^21 characters. Unicode is a superset of ASCII. It represents more characters than ASCII. + Unicode is stored in byte sequences such as UTF-32 and UTF-8. + Unicode is standardized. + + + diff --git a/Python Assignment 1/Python Class 2.txt b/Python Assignment 1/Python Class 2.txt new file mode 100644 index 0000000..8c53253 --- /dev/null +++ b/Python Assignment 1/Python Class 2.txt @@ -0,0 +1,26 @@ +Ques: What should be the output of the following? + (3+4**6-9*10/2) +Ans: 4054.0 + +Ques: Let say I have, some string "hello this side regex". Find out the count of the total vowels in this string. +Ans: >>>vowels='a,e,i,o,u' + >>>vowels + >>>count=0 + >>>string='hello this side regex' + >>>for s in string: + if s in vowels: count=count+1 + >>>count + +Ques: Find out the area of triangle. You have to take value from user about the base, and the height. +Ans: >>> b = float(input('Enter base of a triangle: ')) + >>> h = float(input('Enter height of a triangle: ')) + >>> area = (b * h) / 2 + >>> print('The area of the triangle is %0.1f' % area) + +Ques: Print the calender on the terminal. If you give the year. + -allow the user to input the year. + -then print that calender of that year. +Ans: import calendar + y = int(input("Input the year : ")) + m = int(input("Input the month : ")) + print(calendar.month(y, m)) \ No newline at end of file diff --git a/Python Assignment 2/Assignment 1.txt b/Python Assignment 2/Assignment 1.txt new file mode 100644 index 0000000..1e26e18 --- /dev/null +++ b/Python Assignment 2/Assignment 1.txt @@ -0,0 +1,34 @@ +Ques 1: Find the Armstrong number between the two numbers which are input by user + Armstrong numbers: 153-> 1*1*1 + 5*5*5 + 3*3*3 +Ans 1: x, y = input("Enter two values value: ").split() + x=int(x) + y= int(y) + for num in range(x,y,1) : + sum = 0 + temp = num + while temp > 0: + digit = temp % 10 + sum += digit ** 3 + temp //= 10 + if num == sum: + print(num,"is an Armstrong number") + +Ques 2: Let’s say you have a string “hello this world @2020!!! ” + $ Remove the punctuation like [“@!#$%&*()”] from the string. + $ Final output should be without the punctuation “hello this world 2020”. +Ans 2: s='hello this world @2020!!! ' + s=list(s) + punc=['@','!','#','$','%','&','*','(',')'] + a=[] + for i in s: + if i in punc: + continue + else: + a.append(i)''.join(str(i) for i in a) + +Ques 3: You have a list with words - [“Apple”, “banana”, “cat”, “REGEX”,”apple”] + $ Sort words in Alphabetical order + $ If you get output, like [Apple, apple, banana].How has it happened? +Ans 3: li =['Apple', 'banana', 'cat', 'REGEX','apple'] + li.sort(key=str.casefold) + li[0:3] \ No newline at end of file diff --git a/Python Assignment 2/Assignment 2.txt b/Python Assignment 2/Assignment 2.txt new file mode 100644 index 0000000..4e2cf52 --- /dev/null +++ b/Python Assignment 2/Assignment 2.txt @@ -0,0 +1,98 @@ +Ques 1: Write a Program to print new list which contains all the first Characters of strings present in a list..... + LIST_STATES = ["GOA","RAJASTHAN","KARNATAKA","GUJRAT","MANIPUR","MADHYA PRADESH"] +Ans 1: List = [] + List= ["GOA","RAJASTHAN","KARNATAKA","GUJRAT","MANIPUR","MADHYA PRADESH"] + for i in List: + print(i[0]) + s= list(List) + print(s) + +Ques 2: Write a program to replace each string with an integer value in a given list of strings. + The replacement integer value should be a sum of AScci values of each character of type corresponding string........ + LIST: ['GAnga', 'Tapti', 'Kaveri', 'Yamuna', 'Narmada' ] +Ans 2: LIST= ['GAnga', 'Tapti', 'Kaveri', 'Yamuna', 'Narmada' ] + LIST + k=0 + for i in LIST: + j=0 + sum=0 + for j in i: + sum=sum + ord(j) + LIST[k]=sum + k+=1 + print(LIST) + +Ques 3: You have to run your Program at 9:00am. Date: 14th April 2020. + #HINT: + # You have to use datetime Module or time module... + # You have to convert your output in #LIST_FORMAT + # [ '2020-04-13' , '17:11:01.952975' ] + # you can use this with the help of If/Else statement +Ans 3: import time + while True: + x= time.ctime() + ls= x.split(" ") + print(ls) + if ls== ['Wed', 'Apr', '15', '14:30:03', '2020']: + print("This is the time!!") + break + else: + continue + +Ques 4: Given a tuple: tuple = ('a','l','g','o','r','i','t','h','m') + 1. Using the concept of slicing, print the whole tuple + 2. delete the element at the 3rd Index, print the tuple. +Ans 4: 1. print(tuple[0:]) + 2. #tuples are immutable, so you can not remove elements + #using merge of tuples with the + operator you can remove an item and it will create a new tuple + + tuple = tuple[:3] + tuple[4:] + +Ques 5: Take a list REGex=[1,2,3,4,5,6,7,8,9,0,77,44,15,33,65,89,12] + - print only those numbers greator then 20 + - then print those numbers which are less then 10 or equal to 10 + - store the above two lists in two different list. +Ans 5: l1=[] + l2=[] + for i in REGex: + if i > 20: + print(i) + l1.append(i) + if i <= 10: + print(i) + l2.append(i) + + print(l1) + print(l2) + +Ques 6: Execute standard LINUX Commands using Python Programming. +Ans 6: import os + cmd = 'wc -l my_text_file.txt > out_file.txt' + os.system(cmd) + +Ques 7: Revise *args and **kwargs Concepts. +Ans 7: *args (Non Keyword Arguments) + **kwargs (Keyword Arguments) + We use *args and **kwargs as an argument when we are unsure about the number of arguments to pass in the functions. + i.) *args: + def adder(*num): + sum = 0 + + for n in num: + sum = sum + n + + print("Sum:",sum) + + adder(3,5) + adder(4,5,6,7) + adder(1,2,3,5,6) + + ii.) **kwargs: + def intro(**data): + print("\nData type of argument:",type(data)) + + for key, value in data.items(): + print("{} is {}".format(key,value)) + + intro(Firstname="Sita", Lastname="Sharma", Age=22, Phone=1234567890) + intro(Firstname="John", Lastname="Wood", Email="johnwood@nomail.com", Country="Wakanda", Age=25, Phone=9876543210) \ No newline at end of file diff --git a/Python Assignment 2/Assignment 3.txt b/Python Assignment 2/Assignment 3.txt new file mode 100644 index 0000000..038a509 --- /dev/null +++ b/Python Assignment 2/Assignment 3.txt @@ -0,0 +1,58 @@ +Ques 1: Make a use of time module and for loop and create Loading..... animation 5 times. Note: you have to print only Loading.... in the animated form. +Ans 1: import time as t + import sys + sys.stdout.write('Loading') + for i in range(5): + sys.stdout.write('.') + t.sleep(1) + +Ques 2: Difference between Return and Yield ? +Ans 2: Return: Returns the value to the caller + Return statement runs only one time + Return statement causes a function to exit. + Code written after return statement won't execute + Every function calls run the function from the start. + + Yield: Yield returns the value to the caller and also preserve the current state + Yield statement can run multiple times. + Yield statement is used to define the generators. + Code written after yield statement execute in next function call. + Yield statement function is executed from the last state from where the function get paused. + +Ques 3: Make digital clock and run it for 5 sec. +Ans 3: import time as t + for i in range(5): + a=str(t.ctime()) + b=a.split(" ") + print(b[3]) + t.sleep(1) + +Ques 4: Adding anything in tuple eg:(1,2,3,4) -> (1,2,3,4,5) +Ans 4: old=(1,2,3,4) + lists=list(old) + print(type(lists)) + x=input("Enter value: ") + lists.append(x) + new=tuple(lists) + print(new,type(new)) + +Ques 5: Whatsapp texting using web browser library. +Ans 5: import webbrowser as wb + import time + import datetime + number= input("Please enter the whatsapp number of the person you want to send message to:") + message= input("Enter the message you want to send:") + t= input("Enter the time (hh:mm:ss):") + while True: + c_time= time.ctime() + t_format= c_time[10:18] + time.sleep(1) + print(f'c_time : {t_format}') + if t==t_format : + webbrowser.open_new_tab(f'https://web.whatsapp.com/send?phone=+91{number} and text={message}') + break + elif t 0.00: + positive += 1 +positive = percentage(positive, int(total)) +negative = percentage(negative, int(total)) +neutral = percentage(neutral, int(total)) +polarity = percentage(polarity, int(total)) +check_max = [positive, negative, neutral] +maxi_index = check_max.index(max(check_max)) +print(f'After analyzing {total} tweets, Reaction of peoples about {word} +is: ') +if maxi_index == 0: + print('Positive') +elif maxi_index == 1: + print('Negative') +else: + print('Neutral') +labels = ['Positive ['+str(positive)+'%]', 'Neutral ['+str(neutral)+'%]', +'Negative ['+str(negative)+'%]'] +sizes = [positive, negative, neutral] +colors = ['yellow', 'green', 'red'] +patches, texts = plt.pie(sizes, colors=colors, startangle = 90) +plt.legend(patches, labels, loc='best') +plt.title('Sentiment Analysis') +plt.axis('equal') +plt.tight_layout() +plt.show() \ No newline at end of file diff --git a/Python Assignment 2/Late Night Task.txt b/Python Assignment 2/Late Night Task.txt new file mode 100644 index 0000000..563b3cf --- /dev/null +++ b/Python Assignment 2/Late Night Task.txt @@ -0,0 +1,43 @@ +Ques 1: You have to make a Script to send "Hi" 5 times to a friend. + +Ans: import webbrowser as web + import pyautogui as pg + import time + + count = 0 #initializing count with zero + while count<5: # To send a message 5 times + + web.open('https://web.whatsapp.com/send?phone=number, &text=Hi') # phone = contact number of a reciever with a country code. -> eg: +91xxx-xxx-xxxx + + time.sleep(10) + pg.press('enter') # To send the message by automatically pressing "enter" on the keyboard. + time.sleep(3) # To wait for the message to send so that the tab can be closed easily + pg.hotkey('ctrl','w') # To close the tab + count += 1 + + web.open('http://web.whatsapp.com') + + + + +Ques 2: You have to make a script to run a program and send a same message to 6 persons through a script. + +Ans: import webbrowser as web + import pyautogui as pg + import time + + # NOTE : number should start with +91 . eg: +91xxx-xxx-xxxx + numbers = ['number1', 'number2', 'number3', 'number4', 'number5', 'number6'] + + + url1 = 'https://web.whatsapp.com/send?phone=' + url2 = ',&text=Hi' + + for i in numbers: + web.open(url1+i+url2) # Open Whatsapp Web with a perticular contact + time.sleep(10) + pg.press('enter') + time.sleep(3) + pg.hotkey('ctrl','w') # To close the tab + + web.open('http://web.whatsapp.com') diff --git a/Rishita/Assignment 2.html b/Rishita/Assignment 2.html new file mode 100644 index 0000000..fdad4c4 --- /dev/null +++ b/Rishita/Assignment 2.html @@ -0,0 +1,42 @@ + + + + +Assignment 2 - Databricks + + + + + + + + + + + + + + + + + + + diff --git a/Rishita/Big Data Task 1.txt b/Rishita/Big Data Task 1.txt new file mode 100644 index 0000000..6ce7e19 --- /dev/null +++ b/Rishita/Big Data Task 1.txt @@ -0,0 +1,28 @@ +Ques 1: Difference between Hadoop 1 and Hadoop 2? +Ans: In Hadoop 1 , there is a single NameNode which is thus the single point of failure whereas, in Hadoop 2.x, there are Active and Passive NameNodes. In case, the active NameNode fails, the passive NameNode replaces the active NameNode and takes the charge. As a result, high availability is there in Hadoop 2.x. + + In Hadoop 2 , the YARN provides a central resource manager that share a common resource to run multiple applications in Hadoop whereas data processing is a problem in Hadoop 1.x. + Hadoop 1 is supported only by Linux + Hadoop 2 is supported by both Linux and Windows. + + +Ques 2: In hadoop 2 why the block size has been set to 128 mb? +Ans: + The reason of having this huge block size is to minimize the cost of seek and reduce the meta data information generated per block. + To reduce the disk seeks (IO). Larger the block size, lesser the file blocks. Thus, less number of disk seeks. And block can transfer within respectable limits and that to parallelly. + HDFS have huge datasets, i.e. terabytes and petabytes of data. If we take 4 KB block size for HDFS, just like Linux file system, which has 4 KB block size. Then we would be having too many blocks and therefore too much of metadata. Managing this huge number of blocks and metadata will create huge overhead. Which is something which we don’t want? So, the block size is set to 128 MB. + On the other hand, block size can’t be so large. Because the system will wait for a very long time for the last unit of data processing to finish its work. + + + 128Mb is a multiple of "2" which means we can represent the number in binary like: + + 128Mb= 131072 Kb= 134217728 b = 1000000000000000000000000000 Binary + + With this number we don't wast any bit when we stock data on memory. + +Ques 3: Why name node is relay on memory rather than datanode? +Ans: Name Node only store metadata which is related to the different blocks and because of this reason it needs high memory space. Data Nodes don’t need large memory space. + +Ques 4: Suppose you have 10 PB of data. Metadata is actually store object of file and folder ----> each obj 200 B. How much min Namenode RAM memory you need for your data node in a cluster to manage the metadata? +Estimate minimum Namenode RAM size for HDFS with 10 PB capacity, block size 64 MB, average metadata size for each block is 200 B, replication factor is 3. +Ans: 10 PB/ (64MB *3) *200B = (10 * 10^15)/(64* 10^6 * 3)*200 B = 10^10/(64*3)* 300B = 1.5625e10 B \ No newline at end of file diff --git a/Rishita/Big Data Task 2.txt b/Rishita/Big Data Task 2.txt new file mode 100644 index 0000000..0dee727 --- /dev/null +++ b/Rishita/Big Data Task 2.txt @@ -0,0 +1,44 @@ +Ques 1: Find out the difference between put cmd and copyFromLocal? +Ans 1: -Put and -copyFromLocal is almost same command but a bit difference between both of them. + + -copyFromLocal this command can copy only one source ie from local file system to destination file system. + + -put command can copy single and multiple sources from local file system to destination file system. + + copyFromLocal is similar to put command, but the source is restricted to a local file reference. + +Ques 2: You have to list all files and subfolders and view them in human readable form. Make sure to give the location of parent /regex . +Ans 2: hdfs dfs –ls –h /regex + + +Ques 3: Write a command to see the disk usage of all the replicas used. +Ans 3: You can check the free space in an HDFS directory with a couple of commands. The -df command shows the configured capacity, available free space and used space of a file system in HDFS. + $ hdfs dfs -df + + You can specify the –h option with the df command for more readable and concise output: + $ hdfs dfs -df -h + + You can view the used storage in the entire HDFS file system with the following command: + $ hdfs dfs -du / + + +Ques 4: Remove some file using rm -r (You will see moving them to trash) . Again remove the files so that it will delete without moving into trash. +Ans 4: $hdfs dfs –rm –r /regex/file.txt + + $hdfs dfs –rm –skipTrash /regex/file.txt + + +Ques 5: Retrive the files from hadoop file system location to your localfile system. +Ans 5: $hdfs dfs –copyToLocal /regex/hello.txt + + +Ques 6: By default replication factor in hdfs is 3. If you want to change the replication factor to 5 at the time of putting the files. How will you do it? +Ans 6: $hdfs dfs –setrep –w 5 /regex/yo.txt + +Ques 7: Suppose you have a file and your metadata is stored in namenode. You have to find the location , blocks , filename of the file. +Ans 7: $hdfs dfsadmin -report + or + $bin dfsadmin -report + +Ques 8: Change the ower the the file from hdfs to cloudera or any other user. +And 8: hdfs dfs –chown cloudera /regex/yo.txt diff --git a/Rishita/Hive Day1.html b/Rishita/Hive Day1.html new file mode 100644 index 0000000..9b2bf96 --- /dev/null +++ b/Rishita/Hive Day1.html @@ -0,0 +1,42 @@ + + + + +Hive Day1 - Databricks + + + + + + + + + + + + + + + + + + + diff --git a/Rishita/Talend Task/Aggregate.png b/Rishita/Talend Task/Aggregate.png new file mode 100644 index 0000000..3dadf35 Binary files /dev/null and b/Rishita/Talend Task/Aggregate.png differ diff --git a/Rishita/Talend Task/CSV to XML.png b/Rishita/Talend Task/CSV to XML.png new file mode 100644 index 0000000..ab3b7ac Binary files /dev/null and b/Rishita/Talend Task/CSV to XML.png differ diff --git a/Rishita/Talend Task/Row Filter.png b/Rishita/Talend Task/Row Filter.png new file mode 100644 index 0000000..98cbf76 Binary files /dev/null and b/Rishita/Talend Task/Row Filter.png differ diff --git a/Rishita/Talend Task/Word Find.png b/Rishita/Talend Task/Word Find.png new file mode 100644 index 0000000..103150d Binary files /dev/null and b/Rishita/Talend Task/Word Find.png differ