Skip to content

Commit bb65962

Browse files
authoredFeb 20, 2021
Support of binary encoding
1 parent d09b31b commit bb65962

File tree

6 files changed

+342
-31
lines changed

6 files changed

+342
-31
lines changed
 

‎README.md

+49-15
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This project implements the lossless data compression technique called **arithmetic encoding (AE)**. The project is simple and has just some basic features.
44

5+
The project supports encoding the input as both a floating-point value and a binary code.
6+
57
The project has a main module called `pyae.py` which contains a class called `ArithmeticEncoding` to encode and decode messages.
68

79
# Usage Steps
@@ -12,7 +14,8 @@ To use the project, follow these steps:
1214
2. Instantiate the `ArithmeticEncoding` Class
1315
3. Prepare a Message
1416
4. Encode the Message
15-
5. Decode the Message
17+
5. Get the binary code of the encoded message.
18+
6. Decode the Message
1619

1720
## Import `pyae`
1821

@@ -53,8 +56,17 @@ original_msg = "abc"
5356
Encode the message using the `encode()` method. It accepts the message to be encoded and the probability table. It returns the encoded message (single double value) and the encoder stages.
5457

5558
```python
56-
encoded_msg, encoder = AE.encode(msg=original_msg,
57-
probability_table=AE.probability_table)
59+
encoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg,
60+
probability_table=AE.probability_table)
61+
```
62+
63+
## Get the Binary Code of the Encoded Message
64+
65+
Convert the floating-point value returned from the `AE.encode()` function into a binary code using the `AE.encode_binary()` function.
66+
67+
```python
68+
binary_code, encoder_binary = AE.encode_binary(float_interval_min=interval_min_value,
69+
float_interval_max=interval_max_value)
5870
```
5971

6072
## Decode the Message
@@ -95,6 +107,7 @@ The [`example.py`](/example.py) script has an example that compresses the messag
95107
import pyae
96108

97109
# Example for encoding a simple text message using the PyAE module.
110+
# This example returns the floating-point value in addition to its binary code that encodes the message.
98111

99112
frequency_table = {"a": 2,
100113
"b": 7,
@@ -106,16 +119,22 @@ AE = pyae.ArithmeticEncoding(frequency_table=frequency_table,
106119
original_msg = "abc"
107120
print("Original Message: {msg}".format(msg=original_msg))
108121

109-
encoded_msg, encoder = AE.encode(msg=original_msg,
110-
probability_table=AE.probability_table)
122+
# Encode the message
123+
encoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg,
124+
probability_table=AE.probability_table)
111125
print("Encoded Message: {msg}".format(msg=encoded_msg))
112126

127+
# Get the binary code out of the floating-point value
128+
binary_code, encoder_binary = AE.encode_binary(float_interval_min=interval_min_value,
129+
float_interval_max=interval_max_value)
130+
print("The binary code is: {binary_code}".format(binary_code=binary_code))
131+
132+
# Decode the message
113133
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,
114134
msg_length=len(original_msg),
115135
probability_table=AE.probability_table)
116-
print("Decoded Message: {msg}".format(msg=decoded_msg))
117-
118136
decoded_msg = "".join(decoded_msg)
137+
print("Decoded Message: {msg}".format(msg=decoded_msg))
119138
print("Message Decoded Successfully? {result}".format(result=original_msg == decoded_msg))
120139
```
121140

@@ -124,6 +143,7 @@ The printed messages out of the code are:
124143
```
125144
Original Message: abc
126145
Encoded Message: 0.1729999999999999989175325511
146+
The binary code is: 0.0010110
127147
Decoded Message: abc
128148
Message Decoded Successfully? True
129149
```
@@ -161,6 +181,22 @@ print(encoder)
161181
Decimal('0.5599999999999999349409307570')]}]
162182
```
163183

184+
Here is the binary encoder:
185+
186+
```python
187+
print(encoder_binary)
188+
```
189+
190+
```python
191+
[{0: ['0.0', '0.1'], 1: ['0.1', '1.0']},
192+
{0: ['0.00', '0.01'], 1: ['0.01', '0.1']},
193+
{0: ['0.000', '0.001'], 1: ['0.001', '0.01']},
194+
{0: ['0.0010', '0.0011'], 1: ['0.0011', '0.01']},
195+
{0: ['0.00100', '0.00101'], 1: ['0.00101', '0.0011']},
196+
{0: ['0.001010', '0.001011'], 1: ['0.001011', '0.0011']},
197+
{0: ['0.0010110', '0.0010111'], 1: ['0.0010111', '0.0011']}]
198+
```
199+
164200
## Low Precision
165201

166202
Assume the message to be encoded is `"abc"*20` (i.e. `abc` repeated 20 times) while using the default precision 28. The length of the message is 60.
@@ -184,16 +220,15 @@ AE = pyae.ArithmeticEncoding(frequency_table=frequency_table,
184220
original_msg = "abc"*20
185221
print("Original Message: {msg}".format(msg=original_msg))
186222

187-
encoded_msg, encoder = AE.encode(msg=original_msg,
188-
probability_table=AE.probability_table)
223+
encoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg,
224+
probability_table=AE.probability_table)
189225
print("Encoded Message: {msg}".format(msg=encoded_msg))
190226

191227
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,
192228
msg_length=len(original_msg),
193229
probability_table=AE.probability_table)
194-
print("Decoded Message: {msg}".format(msg=decoded_msg))
195-
196230
decoded_msg = "".join(decoded_msg)
231+
print("Decoded Message: {msg}".format(msg=decoded_msg))
197232
print("Message Decoded Successfully? {result}".format(result=original_msg == decoded_msg))
198233
```
199234

@@ -232,16 +267,15 @@ AE = pyae.ArithmeticEncoding(frequency_table=frequency_table,
232267
original_msg = "abc"*20
233268
print("Original Message: {msg}".format(msg=original_msg))
234269

235-
encoded_msg, encoder = AE.encode(msg=original_msg,
236-
probability_table=AE.probability_table)
270+
encoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg,
271+
probability_table=AE.probability_table)
237272
print("Encoded Message: {msg}".format(msg=encoded_msg))
238273

239274
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,
240275
msg_length=len(original_msg),
241276
probability_table=AE.probability_table)
242-
print("Decoded Message: {msg}".format(msg=decoded_msg))
243-
244277
decoded_msg = "".join(decoded_msg)
278+
print("Decoded Message: {msg}".format(msg=decoded_msg))
245279
print("Message Decoded Successfully? {result}".format(result=original_msg == decoded_msg))
246280
```
247281

‎example.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import pyae
22

33
# Example for encoding a simple text message using the PyAE module.
4+
# This example only returns the floating-point value that encodes the message.
5+
# Check the example_binary.py to return the binary code of the floating-point value.
46

57
frequency_table = {"a": 2,
68
"b": 7,
@@ -12,8 +14,8 @@
1214
original_msg = "abc"
1315
print("Original Message: {msg}".format(msg=original_msg))
1416

15-
encoded_msg, encoder = AE.encode(msg=original_msg,
16-
probability_table=AE.probability_table)
17+
encoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg,
18+
probability_table=AE.probability_table)
1719
print("Encoded Message: {msg}".format(msg=encoded_msg))
1820

1921
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,

‎example_binary.py

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import pyae
2+
3+
# Example for encoding a simple text message using the PyAE module.
4+
# This example returns the floating-point value in addition to its binary code that encodes the message.
5+
6+
frequency_table = {"a": 2,
7+
"b": 7,
8+
"c": 1}
9+
10+
AE = pyae.ArithmeticEncoding(frequency_table=frequency_table,
11+
save_stages=True)
12+
13+
original_msg = "abc"
14+
print("Original Message: {msg}".format(msg=original_msg))
15+
16+
# Encode the message
17+
encoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg,
18+
probability_table=AE.probability_table)
19+
print("Encoded Message: {msg}".format(msg=encoded_msg))
20+
21+
# Get the binary code out of the floating-point value
22+
binary_code, encoder_binary = AE.encode_binary(float_interval_min=interval_min_value,
23+
float_interval_max=interval_max_value)
24+
print("The binary code is: {binary_code}".format(binary_code=binary_code))
25+
26+
# Decode the message
27+
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,
28+
msg_length=len(original_msg),
29+
probability_table=AE.probability_table)
30+
decoded_msg = "".join(decoded_msg)
31+
print("Decoded Message: {msg}".format(msg=decoded_msg))
32+
33+
print("Message Decoded Successfully? {result}".format(result=original_msg == decoded_msg))

‎example_image.py

+11-7
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,19 @@
33
import numpy
44
import matplotlib.pyplot
55

6+
# Example for encoding an image using the PyAE module.
7+
# This example only returns the floating-point value that encodes the image.
8+
# Check the example_image_binary.py to return the binary code of the floating-point value.
9+
610
# Change the precision to a bigger value
711
from decimal import getcontext
8-
getcontext().prec = 10000
12+
getcontext().prec = 444
913

1014
# Read an image.
1115
im = scipy.misc.face(gray=True)
1216

1317
# Just work on a small part to save time. The larger the image, the more time consumed.
14-
im = im[:50, :50]
18+
im = im[:15, :15]
1519

1620
# Convert the image into a 1D vector.
1721
msg = im.flatten()
@@ -25,13 +29,13 @@
2529
AE = pyae.ArithmeticEncoding(frequency_table=frequency_table)
2630

2731
# Encode the message
28-
encoded_msg, _ = AE.encode(msg=msg,
29-
probability_table=AE.probability_table)
32+
encoded_msg, encoder, interval_min_value, interval_max_value = AE.encode(msg=msg,
33+
probability_table=AE.probability_table)
3034

3135
# Decode the message
32-
decoded_msg, _ = AE.decode(encoded_msg=encoded_msg,
33-
msg_length=len(msg),
34-
probability_table=AE.probability_table)
36+
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,
37+
msg_length=len(msg),
38+
probability_table=AE.probability_table)
3539

3640
# Reshape the image to its original shape.
3741
decoded_msg = numpy.reshape(decoded_msg, im.shape)

‎example_image_binary.py

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import scipy.misc
2+
import pyae
3+
import numpy
4+
import matplotlib.pyplot
5+
6+
# Example for encoding an image using the PyAE module.
7+
# This example returns the floating-point value in addition to its binary code that encodes the image.
8+
9+
# Change the precision to a bigger value
10+
from decimal import getcontext
11+
getcontext().prec = 444
12+
13+
# Read an image.
14+
im = scipy.misc.face(gray=True)
15+
16+
# Just work on a small part to save time. The larger the image, the more time consumed.
17+
im = im[:15, :15]
18+
19+
# Convert the image into a 1D vector.
20+
msg = im.flatten()
21+
22+
# Create the frequency table based on its hitogram.
23+
hist, bin_edges = numpy.histogram(a=im,
24+
bins=range(0, 257))
25+
frequency_table = {key: value for key, value in zip(bin_edges[0:256], hist)}
26+
27+
# Create an instance of the ArithmeticEncoding class.
28+
AE = pyae.ArithmeticEncoding(frequency_table=frequency_table, save_stages=True)
29+
30+
# Encode the message
31+
encoded_msg, encoder, interval_min_value, interval_max_value = AE.encode(msg=msg,
32+
probability_table=AE.probability_table)
33+
34+
# Get the binary code that encodes the image
35+
binary_code, encoder_binary = AE.encode_binary(float_interval_min=interval_min_value,
36+
float_interval_max=interval_max_value)
37+
print("The binary code is: {binary_code}".format(binary_code=binary_code))
38+
39+
# Decode the message
40+
decoded_msg, decoder = AE.decode(encoded_msg=encoded_msg,
41+
msg_length=len(msg),
42+
probability_table=AE.probability_table)
43+
44+
# Reshape the image to its original shape.
45+
decoded_msg = numpy.reshape(decoded_msg, im.shape)
46+
47+
# Show the original and decoded images.
48+
fig, ax = matplotlib.pyplot.subplots(1, 2)
49+
ax[0].imshow(im, cmap="gray")
50+
ax[0].set_title("Original Image")
51+
ax[0].set_xticks([])
52+
ax[0].set_yticks([])
53+
ax[1].imshow(decoded_msg, cmap="gray")
54+
ax[1].set_title("Reconstructed Image")
55+
ax[1].set_xticks([])
56+
ax[1].set_yticks([])

‎pyae.py

+189-7
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from decimal import Decimal # Used to offer any user-defined precision.
1+
from decimal import Decimal
22

33
class ArithmeticEncoding:
44
"""
@@ -20,6 +20,10 @@ def __init__(self, frequency_table, save_stages=False):
2020
def get_probability_table(self, frequency_table):
2121
"""
2222
Calculates the probability table out of the frequency table.
23+
24+
frequency_table: A table of the term frequencies.
25+
26+
Returns the probability table.
2327
"""
2428
total_frequency = sum(list(frequency_table.values()))
2529

@@ -32,6 +36,10 @@ def get_probability_table(self, frequency_table):
3236
def get_encoded_value(self, last_stage_probs):
3337
"""
3438
After encoding the entire message, this method returns the single value that represents the entire message.
39+
40+
last_stage_probs: A list of the probabilities in the last stage.
41+
42+
Returns the minimum and maximum probabilites in the last stage in addition to the value encoding the message.
3543
"""
3644
last_stage_probs = list(last_stage_probs.values())
3745
last_stage_values = []
@@ -41,13 +49,21 @@ def get_encoded_value(self, last_stage_probs):
4149

4250
last_stage_min = min(last_stage_values)
4351
last_stage_max = max(last_stage_values)
52+
encoded_value = (last_stage_min + last_stage_max)/2
4453

45-
return (last_stage_min + last_stage_max)/2
54+
return last_stage_min, last_stage_max, encoded_value
4655

4756
def process_stage(self, probability_table, stage_min, stage_max):
4857
"""
4958
Processing a stage in the encoding/decoding process.
59+
60+
probability_table: The probability table.
61+
stage_min: The minumim probability of the current stage.
62+
stage_max: The maximum probability of the current stage.
63+
64+
Returns the probabilities in the stage.
5065
"""
66+
5167
stage_probs = {}
5268
stage_domain = stage_max - stage_min
5369
for term_idx in range(len(probability_table.items())):
@@ -60,10 +76,14 @@ def process_stage(self, probability_table, stage_min, stage_max):
6076

6177
def encode(self, msg, probability_table):
6278
"""
63-
Encodes a message.
79+
Encodes a message using arithmetic encoding.
80+
81+
msg: The message to be encoded.
82+
probability_table: The probability table.
83+
84+
Returns the encoder, the floating-point value representing the encoded message, and the maximum and minimum values of the interval in which the floating-point value falls.
6485
"""
6586

66-
# Make sure
6787
msg = list(msg)
6888

6989
encoder = []
@@ -86,13 +106,98 @@ def encode(self, msg, probability_table):
86106
if self.save_stages:
87107
encoder.append(last_stage_probs)
88108

89-
encoded_msg = self.get_encoded_value(last_stage_probs)
109+
interval_min_value, interval_max_value, encoded_msg = self.get_encoded_value(last_stage_probs)
110+
111+
return encoded_msg, encoder, interval_min_value, interval_max_value
112+
113+
def process_stage_binary(self, float_interval_min, float_interval_max, stage_min_bin, stage_max_bin):
114+
"""
115+
Processing a stage in the encoding/decoding process.
116+
117+
float_interval_min: The minimum floating-point value in the interval in which the floating-point value that encodes the message is located.
118+
float_interval_max: The maximum floating-point value in the interval in which the floating-point value that encodes the message is located.
119+
stage_min_bin: The minimum binary number in the current stage.
120+
stage_max_bin: The maximum binary number in the current stage.
121+
122+
Returns the probabilities of the terms in this stage. There are only 2 terms.
123+
"""
124+
125+
stage_mid_bin = stage_min_bin + "1"
126+
stage_min_bin = stage_min_bin + "0"
127+
128+
stage_probs = {}
129+
stage_probs[0] = [stage_min_bin, stage_mid_bin]
130+
stage_probs[1] = [stage_mid_bin, stage_max_bin]
131+
132+
return stage_probs
133+
134+
def encode_binary(self, float_interval_min, float_interval_max):
135+
"""
136+
Calculates the binary code that represents the floating-point value that encodes the message.
137+
138+
float_interval_min: The minimum floating-point value in the interval in which the floating-point value that encodes the message is located.
139+
float_interval_max: The maximum floating-point value in the interval in which the floating-point value that encodes the message is located.
140+
141+
Returns the binary code representing the encoded message.
142+
"""
143+
144+
binary_encoder = []
145+
binary_code = None
146+
147+
stage_min_bin = "0.0"
148+
stage_max_bin = "1.0"
90149

91-
return encoded_msg, encoder
150+
stage_probs = {}
151+
stage_probs[0] = [stage_min_bin, "0.1"]
152+
stage_probs[1] = ["0.1", stage_max_bin]
153+
154+
while True:
155+
if float_interval_max < bin2float(stage_probs[0][1]):
156+
stage_min_bin = stage_probs[0][0]
157+
stage_max_bin = stage_probs[0][1]
158+
else:
159+
stage_min_bin = stage_probs[1][0]
160+
stage_max_bin = stage_probs[1][1]
161+
162+
if self.save_stages:
163+
binary_encoder.append(stage_probs)
164+
165+
stage_probs = self.process_stage_binary(float_interval_min,
166+
float_interval_max,
167+
stage_min_bin,
168+
stage_max_bin)
169+
170+
# print(stage_probs[0][0], bin2float(stage_probs[0][0]))
171+
# print(stage_probs[0][1], bin2float(stage_probs[0][1]))
172+
if (bin2float(stage_probs[0][0]) >= float_interval_min) and (bin2float(stage_probs[0][1]) < float_interval_max):
173+
# The binary code is found.
174+
# print(stage_probs[0][0], bin2float(stage_probs[0][0]))
175+
# print(stage_probs[0][1], bin2float(stage_probs[0][1]))
176+
# print("The binary code is : ", stage_probs[0][0])
177+
binary_code = stage_probs[0][0]
178+
break
179+
elif (bin2float(stage_probs[1][0]) >= float_interval_min) and (bin2float(stage_probs[1][1]) < float_interval_max):
180+
# The binary code is found.
181+
# print(stage_probs[1][0], bin2float(stage_probs[1][0]))
182+
# print(stage_probs[1][1], bin2float(stage_probs[1][1]))
183+
# print("The binary code is : ", stage_probs[1][0])
184+
binary_code = stage_probs[1][0]
185+
break
186+
187+
if self.save_stages:
188+
binary_encoder.append(stage_probs)
189+
190+
return binary_code, binary_encoder
92191

93192
def decode(self, encoded_msg, msg_length, probability_table):
94193
"""
95-
Decodes a message.
194+
Decodes a message from a floating-point number.
195+
196+
encoded_msg: The floating-point value that encodes the message.
197+
msg_length: Length of the message.
198+
probability_table: The probability table.
199+
200+
Returns the decoded message.
96201
"""
97202

98203
decoder = []
@@ -122,3 +227,80 @@ def decode(self, encoded_msg, msg_length, probability_table):
122227
decoder.append(last_stage_probs)
123228

124229
return decoded_msg, decoder
230+
231+
def float2bin(float_num, num_bits=None):
232+
"""
233+
Converts a floating-point number into binary.
234+
235+
float_num: The floating-point number.
236+
num_bits: The number of bits expected in the result. If None, then the number of bits depends on the number.
237+
238+
Returns the binary representation of the number.
239+
"""
240+
241+
float_num = str(float_num)
242+
if float_num.find(".") == -1:
243+
# No decimals in the floating-point number.
244+
integers = float_num
245+
decimals = ""
246+
else:
247+
integers, decimals = float_num.split(".")
248+
decimals = "0." + decimals
249+
decimals = Decimal(decimals)
250+
integers = int(integers)
251+
252+
result = ""
253+
num_used_bits = 0
254+
while True:
255+
mul = decimals * 2
256+
int_part = int(mul)
257+
result = result + str(int_part)
258+
num_used_bits = num_used_bits + 1
259+
260+
decimals = mul - int(mul)
261+
if type(num_bits) is type(None):
262+
if decimals == 0:
263+
break
264+
elif num_used_bits >= num_bits:
265+
break
266+
if type(num_bits) is type(None):
267+
pass
268+
elif len(result) < num_bits:
269+
num_remaining_bits = num_bits - len(result)
270+
result = result + "0"*num_remaining_bits
271+
272+
integers_bin = bin(integers)[2:]
273+
result = str(integers_bin) + "." + str(result)
274+
return result
275+
276+
def bin2float(bin_num):
277+
"""
278+
Converts a binary number to a floating-point number.
279+
280+
bin_num: The binary number as a string.
281+
282+
Returns the floating-point representation.
283+
"""
284+
285+
if bin_num.find(".") == -1:
286+
# No decimals in the binary number.
287+
integers = bin_num
288+
decimals = ""
289+
else:
290+
integers, decimals = bin_num.split(".")
291+
result = Decimal(0.0)
292+
293+
# Working with integers.
294+
for idx, bit in enumerate(integers):
295+
if bit == "0":
296+
continue
297+
mul = 2**idx
298+
result = result + Decimal(mul)
299+
300+
# Working with decimals.
301+
for idx, bit in enumerate(decimals):
302+
if bit == "0":
303+
continue
304+
mul = Decimal(1.0)/Decimal((2**(idx+1)))
305+
result = result + mul
306+
return result

0 commit comments

Comments
 (0)
Please sign in to comment.