A full-stack real-time chat application that provides secure authentication using face recognition, real-time messaging with WebSockets, group chat management, and integrated speech-to-text functionality. This application leverages AWS Rekognition for face matching and Google Speech-to-Text API for audio transcription.
- Overview
- Features
- Architecture & Tech Stack
- Installation & Setup
- Environment Variables
- Usage
- WebSocket Server Details
- API Endpoints & Routes
- Deployment
- Contact
This application is designed to provide a secure and interactive chatting experience. Users register by creating an account (name, email, and password) and are prompted to capture a selfie using their webcam. This image is processed with AWS Rekognition and securely stored as part of the user's profile. During subsequent logins, users must verify their identity with a live selfie. Once authenticated, users can create or join chat groups, invite others via email, and communicate using both text and voice (with real-time speech-to-text conversion using the Google Speech-to-Text API).
-
User Authentication with Face Matching:
- Sign-up with name, email, password, and selfie registration
- Login requires both traditional credentials and a real-time facial verification
- Integration with AWS Rekognition for secure face matching
-
Chat Rooms & Messaging:
- Create chat rooms and invite members
- Real-time messaging using WebSockets (ws)
- Persistent message storage using Redis to maintain conversation history
-
Speech-to-Text Integration:
- Record audio messages that are transcribed to text using the Google Speech-to-Text API
- Option to review and edit transcriptions before sending
-
Real-Time Communication:
- WebSocket server handles three main events: user identification, room joining, and chat messaging
- Instant broadcasting of messages to all room participants
-
Frontend:
- Next.js for UI rendering and API routes
- Webcam integration for capturing selfies during sign-up and login
- Chat UI with text and speech-to-text capabilities
-
Backend:
-
Next.js API routes for handling authentication, face matching, chat room management, and message storage
-
Node.js server for real-time communication using WebSockets
-
Redis for persistent session and message storage
-
Speech to text Service:
-
GCP CLOUD RUN function a separate micro service handling from speech to text
-
GCP BUCKET for audio file storage
-
-
Websocket Server:
-
Containerized and deployed as separate microservice on cloud railway
-
Third-Party Integrations:
- AWS Rekognition: Face recognition and matching
- Google Speech-to-Text API: Converting recorded audio into text
-
Deployment:
- Vercel for front-end deployment
- Environment variables securely managed through platform-specific settings
- gcp cloud function for speech to text service deployment
- Containerized Websocker server on railway
##You can refer to detail documentation for setup at https://chatsphere-mzgl.vercel.app/docs
----------OR---------------
- Node.js (v14+)
- npm or yarn
- Redis instance (local or hosted)
- AWS account with Rekognition permissions
- Google Cloud account with Speech-to-Text API enabled
-
Clone the Repository:
git clone https://github.com/swarnikaraj/chatsphere.git cd chatsphere -
Install Dependencies:
npm install # or yarn install -
Configure Environment Variables:
Create a
.envfile in the root directory and add:# Redis configuration REDIS_URL=your_redis_connection_url # WebSocket Server Port WS_PORT=8080 # AWS Credentials AWS_ACCESS_KEY_ID=your_aws_access_key AWS_SECRET_ACCESS_KEY=your_aws_secret_key AWS_REGION=your_aws_region # Google Speech-to-Text API Key GOOGLE_SPEECH_API_KEY=your_google_api_key # Additional configurations as needed...
-
Start the socket server Application:
cd socket-server docker-compose up -d docker-compose logs -f to stop docker-compose down --------or---------- docker build -t websocket-server . docker run -d \ --name websocket-server \ -p 8080:8080 \ -e REDIS_URL="rediss://default:your_password@your-redis-host:6379" \ -e WS_PORT=8080 \ -e NODE_ENV=production \ --cpus=0.5 \ --memory=512m \ websocket-server
-
Start the speech to text service :
cd speechToTech-service virtul env set up for python based on OS type pip install requirements.txt python run main.py
-
Start the gui Application:
-
For Development:
cd gui npm run dev # or yarn dev
-
For Production Build:
cd gui npm run build npm start # or equivalent yarn commands
-
AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_REGION= S3_BUCKET= NEXTAUTH_SECRET= NEXTAUTH_URL=http://localhost:3000 DATABASE_URL= EMAIL_USER= EMAIL_PASS= NEXT_PUBLIC_SOCKET_URL='' NEXT_PUBLIC_APP_URL=http://localhost:3000 NEXT_PUBLIC_WS_URL=ws://localhost:8080 NEXT_PUBLIC_TRANSCRIPTION_SERVICE_URL=
-
User Registration:
- Sign up with your name, email, and password
- Grant camera access to capture a selfie
- The captured image is processed via AWS Rekognition for future verification
-
User Login:
- Enter your username and password
- A new real-time selfie is captured and matched against the stored image
- On successful verification, you are directed to the main dashboard
-
Chat Rooms & Messaging:
- Create a new chat room and invite members via email
- Upon invitation acceptance, join the chat room
- Send messages as text or record audio that is transcribed into text
-
Real-Time Updates:
- WebSocket events manage user identification, joining rooms, and live message broadcasting
- Redis ensures message persistency and quick access to recent chat history
The WebSocket server (located in websocket-server/server.js) handles the following:
-
User Identification: On connection, the client sends an "identify" event which maps the user ID to the active WebSocket connection.
-
Room Joining: The "join-room" event assigns the user to a chat room. The server stores this information in Redis for persistence and logs room participation.
-
Chat Messaging: The "chat-message" event broadcasts messages to all participants in the room. Each message is stamped with a unique ID and a timestamp before broadcasting.
The server also includes robust logging and error handling to monitor client connections, message flows, and Redis operations.
-
User Authentication:
POST /api/registry– Handles user registration, including face capture and storagePOST /api/login– Handles login by verifying credentials and live face capture
-
Chat Room Management:
POST /api/chat/– Creates a new chat roomPOST /api/invitations– Sends an invitation email to join a chat room while creating group or roomGET /api/messages– Retrieves past chat messages from the database (Redis)
-
Speech-to-Text:
POST gcp_cloud_function_url– Converts recorded audio to text using the Google Speech-to-Text API and its a separate service
-
Websocket server:
- NEXT_PUBLIC_WS_URL=wss://socketserver PORT 8080
-
Frontend: Deployed on Vercel, leveraging Next.js for seamless SSR and API routes
-
Backend WebSocket Server: Can be deployed on a Node.js hosting service or alongside the frontend if architecture permits. Ensure the WS_PORT and other environment variables are correctly set
For any questions or feedback, please reach out at swarnikarajsingh@gmail.com.