diff --git a/dataeng/README.md b/dataeng/README.md
index 3df8bdd1..dac11c47 100644
--- a/dataeng/README.md
+++ b/dataeng/README.md
@@ -1,187 +1,149 @@
-### Prerequisites
-* Python 3.7 or greater
-* Docker 19.03 or greater
-* Git 2.28 or greater
-* Postgres 13 or greater
-
-## Level 1
-
-### Files definitions:
-
-- src_data - Path with source data needed to be processed.
-- processed_data - Path with output processed data.
-- user_id.jpg - User image file, for example, 0001.jpg. Could be several for different users in source data path.
-- user_id.csv - User info file, for example, 0001.csv. Could be several for different users in source data path.
-
-User csv file contains next columns:
-
-1. first_name - User first name
-2. last_name - User last name
-3. birthts - User birthdate timestamp in milliseconds UTC
-
-Test csv and img files could be found in the [02-src-data](./02-src-data) folder
-
-**For example:**
-
-```text
-first_name, last_name, birthts
-Ivan, Ivanov, 946674000000
+# Read me
+
+### This is the solution of Dataeng Internship task
+#### There are 2 working modes
+##### First Working mode:
+Reading all the CSV files of the users and checking the previously processed output file, then filtering the duplicate user_ids and then updating the processed output file with the new users.
+##### Second Working mode:
+Only if we have an already processed output file, the client has the ability to edit one of the users in the database, He can change the first_name and the last_name of the selected user.
+##### Code structure
+The code is divided into a main part and 4 functions
+1- A simple function for reading a single CSV file: list
+2- A function that concatenates all the data in the CSV files
+3- A function that writes data into a CSV file
+4- A function that checks wether there's a processed output file before or not
+The main function takes the selection of the working mode and then guides the program into the needed functionality.
+
+## SQL Answers
+### 1.  Rewrite the SQL without subquery:
+```SQL SELECT id
+  FROM USERS AS usertable
+    JOIN departments AS deptable
+      ON deptable.user_id = usertable.id
+WHERE department_id !=1;
 ```
-
-### Data processing description
-
-1. Read csv file
-2. Match images for each user
-3. Combine data from CSV and image path
-4. Update processed_data/output.csv CSV file and add new data. Important we can update data for previously processed
-   user. In output CSV and DB we should not duplicate records. Output CSV file format: user_id, first_name,
-   last_name, birthts, img_path
-
-## Task
-
-Implement a script to process files from the `src_data` folder.
-
-## Results delivery format
-
-Results should be implemented as a python script with demo data. Also should be
-provided the README.md file with the description of your solution.
-
-## Level 2
-The same as **Level 1** with the following extras.
-
-## Results delivery format
-
-Results should be implemented as a service. The service should periodically read source data and process it.
-Also, the service should implement web server with next endpoints:
-- **GET**  /data - get all records from DB in JSON format. Need to implement filtering by: is_image_exists = True/False, user min_age and max_age in years.
-- **POST** /data - manually run data processing in src_data
-
-Should be provided the README.md file with the description of your solution.
-
-## Level 3
-The same as **Level 2** but with next differences.
-
-### Files definitions:
-Source data and processed data should store in Minio. Minio service already defined in [docker-compose](./01-docker-compose/docker-compose.yml) file.
-
-### Data processing description
-
-1. Read csv file
-2. Match images for each user
-3. Combine data from CSV and image path
-4. Update processed_data/output.csv CSV file and add new data. Important we can update data for previously processed
-   user. In output CSV and DB we should not to duplicate records. Output CSV file format: user_id, first_name,
-   last_name, birthts, img_path
-5. Write this combined data to DB. Record should contain next columns: id, user_id, first_name, last_name, birthdate, img_path. id - autoincrement unique record id.
-Postgres DB service already defined in [docker-compose](./01-docker-compose/docker-compose.yml)
-
-## Results delivery format
-
-Results should be implemented as a service. The service should periodically read source data and process it.
-Also, the service should implement web server with next endpoints:
-- **GET**  /data - get all records from DB in JSON format. Need to implement filtering by: is_image_exists = True/False, user min_age and max_age in years.
-- **POST** /data - manually run data processing in src_data
-
-The solution should work in docker-compose. As base template can be taken [docker-compose](./01-docker-compose/docker-compose.yml) file.
-
-**As a solution, you should implement one of the levels. You don't need to implement all of them, just choose the one you can solve.** 
-## Coding Tasks for Data Engineers
-The following tasks cover different sections to check candidate's basic knowledge in SQL, Algorithms and Linux shell. 
-
-### SQL
-1. Rewrite this SQL without subquery:
-```sql
-SELECT id
-FROM users
-WHERE id NOT IN (
-	SELECT user_id
-	FROM departments
-	WHERE department_id = 1
-);
+### 2.  Write a SQL query to find all duplicate lastnames in a table named  **user**
+```SQL SELECT lastname, COUNT(lastname)
+FROM USERS
+GROUP BY lastname
+HAVING COUNT(lastname) > 1;
 ```
-2. Write a SQL query to find all duplicate lastnames in a table named **user**
-```text
-+----+-----------+-----------
-| id | firstname | lastname |
-+----+-----------+-----------
-| 1  | Ivan      | Sidorov  |
-| 2  | Alexandr  | Ivanov   |
-| 3  | Petr      | Petrov   |
-| 4  | Stepan    | Ivanov   |
-+----+-----------+----------+
+### 3. Write a SQL query to get a username from the  **user**  table with the second highest salary from  **salary**  tables. Show the username and it's salary in the result.
+```SQL SELECT USERTABLE.username, SALARYTABLE.salary
+  FROM salary AS SALARYTABLE
+       INNER JOIN user AS USERTABLE
+          ON SALARYTABLE.user_id = USERTABLE.id
+ ORDER
+    BY SALARYTABLE.salary DESC
+LIMIT 1 OFFSET 1;
 ```
-3. Write a SQL query to get a username from the **user** table with the second highest salary from **salary** tables. Show the username and it's salary in the result.
-```sql
-+---------+--------+
-| user_id | salary |
-+----+--------+----+
-| 1       | 1000   |
-| 2       | 1100   |
-| 3       | 900    |
-| 4       | 1200   |
-+---------+--------+
-```
-```sql
-+---------+--------+
-| id | username    |
-+----+--------+----+
-| 1  | Alex       |
-| 2  | Maria      |
-| 3  | Bob        |
-| 4  | Sean       |
-+---------+-------+
+## Algorithms & Datastructre
+### 1.  Optimization of the Python code snippet:
+```python
+from collections import Counter
+def count_connections(list1: list, list2: list) -> int:
+    counter1 = Counter(list1)
+    counter2 = Counter(list2)
+    l1 = set(list1)
+    intersections = l1.intersection(list2)
+    sum = 0
+  for i in intersections:
+        sum += int(counter1[i]) * int(counter2[i])
+    return sum
 ```
-### Algorithms and Data Structures
-1. Optimise execution time of this Python code snippet:
+### 2.  Given a string  `s`, find the length of the longest substring without repeating characters. Analyze your solution and please provide Space and Time complexities.
+```python
+def findLongestSubstring(string):
+    if len(string) == 0:
+        return 0
+  n = len(string)
+    # starting point of current substring.
+  st = 0
+  # maximum length substring without
+ # repeating characters.  maxlen = 0
+  # starting index of maximum
+ # length substring.  start = 0
+  # Hash Map to store last occurrence
+ # of each already visited character.  pos = {}
+    # Last occurrence of first
+ # character is index 0  pos[string[0]] = 0
+  for i in range(1, n):
+        # If this character is not present in hash,
+ # then this is first occurrence of this # character, store this in hash.  if string[i] not in pos:
+            pos[string[i]] = i
+        else:
+            # If this character is present in hash then
+ # this character has previous occurrence, # check if that occurrence is before or after # starting point of current substring.  if pos[string[i]] >= st:
+
+                # find length of current substring and
+ # update maxlen and start accordingly.  currlen = i - st
+                if maxlen < currlen:
+                    maxlen = currlen
+                    start = st
+                # Next substring will start after the last
+ # occurrence of current character to avoid # its repetition.  st = pos[string[i]] + 1
+  # Update last occurrence of
+ # current character.  pos[string[i]] = i
+    # Compare length of last substring with maxlen
+ # and update maxlen and start accordingly.  if maxlen < i - st:
+        maxlen = i - st
+        start = st
+    # The required longest substring without
+ # repeating characters is from string[start] # to string[start+maxlen-1].  return string[start: start + maxlen]
 ```
-def count_connections(list1: list, list2: list) -> int:
-  count = 0
-  
-  for i in list1:
-    for j in list2:
-      if i == j:
-        count += 1
-  
-  return count
+**Time Complexity:** O(n)
+**Auxiliary Space:** O(n)
+### 3.  Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order.
+```python
+def binary_search(arr: list, low, high, target):
+    if target < arr[0]:
+        return 0
+  elif target > arr[-1]:
+        return len(arr)
+    # Check base case
+  if high >= low:
+        mid = (high + low) // 2
+  # If element is present at the middle itself
+  if arr[mid] == target:
+            return mid
+        # If element is smaller than mid, then it can only
+ # be present in left subarray  elif arr[mid] > target:
+            return binary_search(arr, low, mid - 1, target)
+        # Else the element can only be present in right subarray
+  else:
+            return binary_search(arr, mid + 1, high, target)
+    else:
+        # Element is not present in the array, return the index where it should've been
+  return high + 1
 ```
-
-2. Given a string `s`, find the length of the longest substring without repeating characters.
-   Analyze your solution and please provide Space and Time complexities.
-
-**Example 1**
-```text
-Input: s = "abcabcbb"
-Output: 3
-Explanation: The answer is "abc", with the length of 3.
+```python
+def linear_search(list1: list, target):
+    if target < list1[0]:
+        return 0
+  for i in range(len(list1)):
+        if target == list1[i]:
+            return i
+        elif target < list1[i]:
+            return i
+    else:
+        return len(list1)
 ```
-**Example 2**
-```text
-Input: s = "bbbbb"
-Output: 1
-Explanation: The answer is "b", with the length of 1.
+## Linux Adminstration
+### 1.  List processes listening on ports 80 and 443
+```bash
+sudo netstat -tnlp | grep :443
+sudo netstat -tnlp | grep :80
 ```
-**Example 3**
-```text
-Input: s = "pwwkew"
-Output: 3
-Explanation: The answer is "wke", with the length of 3.
-Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.
+### 2.  List process environment variables by given PID
+```bash
+cat /proc/[process ID]/environ | tr '\0' '\n'
 ```
-**Example 3**
-```text
-Input: s = ""
-Output: 0
+### 3.  Launch a python program  `my_program.py`  through CLI in the background. How would you close it after some period of time?
+```bash
+nohup ./my_program.py &
+ps -ef | grep my_program.py
+kill -9 [PID]
 ```
 
-3. Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order.
 
-**Example:**
-```text
-Input: nums = [1,3,5,6], target = 5
-Output: 2
-```
 
-### Linux Shell
-1. List processes listening on ports 80 and 443
-2. List process environment variables by given PID
-3. Launch a python program `my_program.py` through CLI in the background. How would you close it after some period of time?
diff --git a/dataeng/main.py b/dataeng/main.py
new file mode 100644
index 00000000..45e8c5a5
--- /dev/null
+++ b/dataeng/main.py
@@ -0,0 +1,112 @@
+# Import necessary libraries
+import csv
+import os
+from os import listdir
+from os.path import isfile, join
+import pandas as pd
+from pathlib import Path
+
+# Fixed Paths variables
+home = str(Path.home())
+wd = join(home, 'internship/dataeng')
+src_wd = join(wd, '02-src-data')
+prc_wd = join(wd, 'processed_data')
+out_wd = join(prc_wd, 'output.csv')
+
+# Working mode
+wor_mode = "press 1 for reading the source files and updating the output file \n" \
+           "press 2 for editing the processed data \n" \
+           "press any other key to quit"
+
+
+def main():
+    print(wor_mode)
+    wor_sel = input()
+
+    if wor_sel == '1':
+        df = check_processed_file(prc_wd, src_wd)
+        flag = df.empty
+        if not flag:
+            headers, data_rows = read_all_csv(src_wd)
+            ndf = pd.DataFrame(data_rows, columns=headers)
+            ndf['user_id'] = ndf['user_id'].astype('int64')
+            filtered_df = pd.concat([df, ndf]).drop_duplicates(subset=['user_id']).reset_index(drop=True)
+            write_df_csv(out_wd, headers, data_rows, filtered_df)
+        else:
+            headers, data_rows = read_all_csv(src_wd)
+            write_df_csv(out_wd, headers, data_rows)
+
+    elif wor_sel == '2':
+        df = check_processed_file(prc_wd, src_wd)
+        flag = df.empty
+        if not flag:
+            try:
+                user_id = int(input("Please enter the user id to edit"))
+                user_idx = df[df['user_id'] == user_id].index.values[0]
+                user_in = input("New first name")
+                df.at[user_idx, "first_name"] = user_in
+                user_in = input("New last name")
+                df.at[user_idx, " last_name"] = user_in
+                print(df)
+                write_df_csv(out_wd, dataframe=df)
+            except:
+                print("Please enter the id number of the user")
+    else:
+        print("No database available yet")
+    return None
+
+
+def read_csv_simple(file_dir):
+    with open(file_dir, encoding='utf-8') as csv_file:
+        data = list(csv.reader(csv_file, delimiter=','))
+        return data[0], data[1]
+
+
+def read_all_csv(src_wd):
+    headers = []
+    data_rows = []
+    onlyfiles = [f for f in listdir(src_wd) if isfile(join(src_wd, f))]
+
+    for file_path in onlyfiles:
+        fp = join(src_wd, file_path)
+        # Split the extension from the path and normalise it to lowercase.
+        ext = os.path.splitext(fp)[-1].lower()
+        fp_wo_ext = os.path.splitext(fp)[0].lower()
+        # Now we can simply use == to check for equality, no need for wildcards.
+        if ext == ".csv":
+            headers, data = read_csv_simple(fp)
+            user_id = f"{fp_wo_ext[-4:]}"
+            img_fp = f"{fp_wo_ext}.png"
+            data.insert(0, f"{user_id}")
+            data.append(f"{img_fp}")
+            data_rows.append(data)
+    headers.insert(0, 'user_id')
+    headers.append('img_path')
+    return headers, data_rows
+
+
+def write_df_csv(out_wd, headers=None, data=None, dataframe=pd.DataFrame()):
+    flag = dataframe.empty
+    if not flag:
+        dataframe.to_csv(out_wd, index=False, encoding="utf-8")
+    else:
+        df = pd.DataFrame(data, columns=headers)
+        df.to_csv(out_wd, index=False, encoding="utf-8")
+    return True
+
+
+def check_processed_file(prc_wd, src_wd):
+    flag = os.path.exists(out_wd)
+    if flag:
+        df = pd.read_csv(out_wd, encoding="utf-8")
+        return df
+    else:
+        try:
+            os.mkdir(prc_wd)
+        except:
+            print("processed folder exists")
+        return pd.DataFrame()
+
+
+if __name__ == '__main__':
+    main()
diff --git a/dataeng/optimization.py b/dataeng/optimization.py
new file mode 100644
index 00000000..ed16523e
--- /dev/null
+++ b/dataeng/optimization.py
@@ -0,0 +1,129 @@
+from collections import Counter
+import numpy as np
+
+
+list1 = list(np.round(np.random.rand(100)*3))
+list2 = list(np.round(np.random.rand(100)*3))
+
+
+def count_connections(list1: list, list2: list) -> int:
+    counter1 = Counter(list1)
+    counter2 = Counter(list2)
+    l1 = set(list1)
+    intersections = l1.intersection(list2)
+    sum = 0
+    for i in intersections:
+        sum += int(counter1[i]) * int(counter2[i])
+    return sum
+
+
+def findLongestSubstring(string):
+    if len(string) == 0:
+        return 0
+    n = len(string)
+    # starting point of current substring.
+    st = 0
+    # maximum length substring without
+    # repeating characters.
+    maxlen = 0
+    # starting index of maximum
+    # length substring.
+    start = 0
+    # Hash Map to store last occurrence
+    # of each already visited character.
+    pos = {}
+    # Last occurrence of first
+    # character is index 0
+    pos[string[0]] = 0
+    for i in range(1, n):
+        # If this character is not present in hash,
+        # then this is first occurrence of this
+        # character, store this in hash.
+        if string[i] not in pos:
+            pos[string[i]] = i
+        else:
+            # If this character is present in hash then
+            # this character has previous occurrence,
+            # check if that occurrence is before or after
+            # starting point of current substring.
+            if pos[string[i]] >= st:
+
+                # find length of current substring and
+                # update maxlen and start accordingly.
+                currlen = i - st
+                if maxlen < currlen:
+                    maxlen = currlen
+                    start = st
+                # Next substring will start after the last
+                # occurrence of current character to avoid
+                # its repetition.
+                st = pos[string[i]] + 1
+            # Update last occurrence of
+            # current character.
+            pos[string[i]] = i
+    # Compare length of last substring with maxlen
+    # and update maxlen and start accordingly.
+    if maxlen < i - st:
+        maxlen = i - st
+        start = st
+    # The required longest substring without
+    # repeating characters is from string[start]
+    # to string[start+maxlen-1].
+    return string[start: start + maxlen]
+
+
+string = "abcabcbb"
+print(findLongestSubstring(string))
+string = "bbbbb"
+print(findLongestSubstring(string))
+string = "pwwkew"
+print(findLongestSubstring(string))
+string = ""
+print(findLongestSubstring(string))
+
+
+def linear_search(list1: list, target):
+    if target < list1[0]:
+        return 0
+    for i in range(len(list1)):
+        if target == list1[i]:
+            return i
+        elif target < list1[i]:
+            return i
+    else:
+        return len(list1)
+
+
+# Returns index of x in arr if present, else -1
+def binary_search(arr, low, high, target):
+    if target < arr[0]:
+        return 0
+    elif target > arr[-1]:
+        return len(arr)
+    # Check base case
+    if high >= low:
+        mid = (high + low) // 2
+        # If element is present at the middle itself
+        if arr[mid] == target:
+            return mid
+        # If element is smaller than mid, then it can only
+        # be present in left subarray
+        elif arr[mid] > target:
+            return binary_search(arr, low, mid - 1, target)
+        # Else the element can only be present in right subarray
+        else:
+            return binary_search(arr, mid + 1, high, target)
+    else:
+        # Element is not present in the array, return the index where it should've been
+        return high + 1
+
+
+# Test array
+arr = [1, 2, 3, 4, 5, 6, 7, 10, 15]
+
+
+
+result = binary_search(arr, 0, len(arr) - 1, 4.5)
+idx = linear_search(arr, 2)
+print(idx, result)
+