Skip to content

Commit 4a7e1db

Browse files
committed
charlesw#579 Updated to tesseract 5.2
1 parent 8c08c79 commit 4a7e1db

18 files changed

+107
-40
lines changed

ChangeLog.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
### Version 5.0
2-
* Upgraded to Tesseract 5.0 [Issue 579](https://github.com/charlesw/tesseract/issues/579)
2+
* Upgraded to Tesseract 5.2 [Issue 579](https://github.com/charlesw/tesseract/issues/579)
33
* Fixed Fix dynamic linking on macos [Issue #588](https://github.com/charlesw/tesseract/issues/588)
44
* Fixed null reference exception when executing assembly is not available [Issue 591](https://github.com/charlesw/tesseract/issues/591)
55

@@ -8,6 +8,7 @@
88
* Setting regions of interest doesn't work [Issue 489](https://github.com/charlesw/tesseract/issues/489)
99
* PageSegMode.SingleBlockVertText does not work [Issue 490](https://github.com/charlesw/tesseract/issues/490)
1010
* Unz files don't work [Issue 594](https://github.com/charlesw/tesseract/issues/594)
11+
* Removed support for dotnet 4.0 and 4.5
1112

1213
### Version 4.1.1
1314

docs/Compling_tesseract_and_leptonica.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The following also differ from [[Compiling-Tesseract-and-Leptonica]] in that the
1010
The main benefit of this is that it's possible to compile tesseract against the leptonica dll rather than statically
1111
linking leptonica into tesseract which increases file size (since the leptonica dll is still required).
1212

13-
1. Install Visual Studio 2019
13+
1. Install Visual Studio 2022
1414
2. Install CMake (ensure it's on your path)
1515
3. Install [vcpkg](https://github.com/Microsoft/vcpkg/)
1616
* Note: I also set an environment variable VCPKG_HOME to this directory and added it to path for convenience
@@ -21,13 +21,13 @@ linking leptonica into tesseract which increases file size (since the leptonica
2121
vcpkg install giflib:x86-windows-static libjpeg-turbo:x86-windows-static liblzma:x86-windows-static libpng:x86-windows-static tiff:x86-windows-static zlib:x86-windows-static
2222
vcpkg install giflib:x64-windows-static libjpeg-turbo:x64-windows-static liblzma:x64-windows-static libpng:x64-windows-static tiff:x64-windows-static zlib:x64-windows-static
2323
git clone https://github.com/DanBloomberg/leptonica.git & cd leptonica
24-
git checkout -b 1.80.0 1.80.0
24+
git checkout -b 1.82.0 1.82.0
2525
mkdir vs16-x86 & cd vs16-x86
26-
cmake .. -G "Visual Studio 16 2019" -A Win32 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x86-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x86
26+
cmake .. -G "Visual Studio 17 2022" -A Win32 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x86-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x86
2727
cmake --build . --config Release --target install
2828
cd ..
2929
mkdir vs16-x64 & cd vs16-x64
30-
cmake .. -G "Visual Studio 16 2019" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x64
30+
cmake .. -G "Visual Studio 17 2022" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x64
3131
cmake --build . --config Release --target install
3232
```
3333
4. Build Tesseract:
@@ -36,13 +36,13 @@ linking leptonica into tesseract which increases file size (since the leptonica
3636
```
3737
git clone https://github.com/tesseract-ocr/tesseract.git
3838
cd tesserct
39-
git checkout -b 4.1.1 4.1.1
40-
mkdir vs16-x86 & cd vs16-x86
41-
cmake .. -G "Visual Studio 16 2019" -A Win32 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x86
39+
git checkout -b 5.2.0 5.2.0
40+
mkdir vs17-x86 & cd vs17-x86
41+
cmake .. -G "Visual Studio 17 2022" -A Win32 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x86
4242
cmake --build . --config Release --target install
4343
cd ..
44-
mkdir vs16-x64 & cd vs16-x64
45-
cmake .. -G "Visual Studio 16 2019" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x64
44+
mkdir vs17-x64 & cd vs17-x64
45+
cmake .. -G "Visual Studio 17 2022" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x64
4646
cmake --build . --config Release --target install
4747
```
4848

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Compling tesseract and leptonica.md
2+
* [Index](./ReadMe.md)
3+
4+
## Notes
5+
Build instructions for Tesseract 4.1.1 and leptonica 1.80.0. Please note that build systems do change so while the following
6+
has been tested with the listed versions building against any other versions including master may not work as expected and
7+
aren't supported.
8+
9+
The following also differ from [[Compiling-Tesseract-and-Leptonica]] in that they use vcpkg to manage the dependencies.
10+
The main benefit of this is that it's possible to compile tesseract against the leptonica dll rather than statically
11+
linking leptonica into tesseract which increases file size (since the leptonica dll is still required).
12+
13+
1. Install Visual Studio 2022
14+
2. Install CMake (ensure it's on your path)
15+
3. Install [vcpkg](https://github.com/Microsoft/vcpkg/)
16+
* Note: I also set an environment variable VCPKG_HOME to this directory and added it to path for convenience
17+
18+
4. Build Leptonica:
19+
20+
```
21+
vcpkg install giflib:x86-windows-static libjpeg-turbo:x86-windows-static liblzma:x86-windows-static libpng:x86-windows-static tiff:x86-windows-static zlib:x86-windows-static
22+
vcpkg install giflib:x64-windows-static libjpeg-turbo:x64-windows-static liblzma:x64-windows-static libpng:x64-windows-static tiff:x64-windows-static zlib:x64-windows-static
23+
git clone https://github.com/DanBloomberg/leptonica.git & cd leptonica
24+
git checkout -b 1.82.0 1.82.0
25+
mkdir vs16-x86 & cd vs16-x86
26+
cmake .. -G "Visual Studio 17 2022" -A Win32 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x86-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x86
27+
cmake --build . --config Release --target install
28+
cd ..
29+
mkdir vs16-x64 & cd vs16-x64
30+
cmake .. -G "Visual Studio 17 2022" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x64
31+
cmake --build . --config Release --target install
32+
```
33+
4. Build Tesseract:
34+
35+
36+
```
37+
git clone https://github.com/tesseract-ocr/tesseract.git
38+
cd tesserct
39+
git checkout -b 5.2.0 5.2.0
40+
mkdir vs17-x86 & cd vs17-x86
41+
cmake .. -G "Visual Studio 17 2022" -A Win32 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x86
42+
cmake --build . --config Release --target install
43+
cd ..
44+
mkdir vs17-x64 & cd vs17-x64
45+
cmake .. -G "Visual Studio 17 2022" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x64
46+
cmake --build . --config Release --target install
47+
```
48+
49+
### Leptonica Notes:
50+
51+
* Leptonica now needs to be built to use shared libraries (dlls) explicitly, this is accomplished by setting the ``BUILD_SHARED_LIBS`` to ``ON`` (``-DBUILD_SHARED_LIBS=ON``)
52+
* Using [Self build](https://github.com/SoftwareNetwork/sw) hasn't been tested and is disabled using ``SW_BUILD=OFF``.
53+
54+
### Tesseract Notes:
55+
56+
* For portability architecture optimizations have been disabled using ``-DAUTO_OPTIMIZE=OFF`.
57+
This however will disable platform specific optimizations (AVX, SSE4.1, etc) which would likely
58+
result in better performance if your guarantied they will be available.
59+
* Like leptonica Self Build has also been disabled using ``-DSW_BUILD=OFF``.

src/InternalTrace.3044.log

Whitespace-only changes.

src/InternalTrace.3144.log

Whitespace-only changes.

src/InternalTrace.3536.log

Whitespace-only changes.

src/InternalTrace.7132.log

Whitespace-only changes.

src/InternalTrace.8476.log

Whitespace-only changes.

src/Tesseract.Drawing/Tesseract.Drawing.csproj

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@
1414
<PackageProjectUrl>https://github.com/charlesw/tesseract/</PackageProjectUrl>
1515
<RepositoryUrl>https://github.com/charlesw/tesseract/</RepositoryUrl>
1616
<PackageTags>Tesseract Ocr</PackageTags>
17-
<Version>4.1.1</Version>
18-
<AssemblyVersion>4.1.1.0</AssemblyVersion>
17+
<Version>5.2.0</Version>
18+
<AssemblyVersion>5.2.0</AssemblyVersion>
1919
<NeutralLanguage></NeutralLanguage>
2020
<PackageLicenseExpression>Apache-2.0</PackageLicenseExpression>
2121
<RootNamespace>Tesseract</RootNamespace>
@@ -35,7 +35,7 @@
3535

3636

3737
<ItemGroup>
38-
<PackageReference Include="System.Drawing.Common" Version="5.0.0" />
38+
<PackageReference Include="System.Drawing.Common" Version="6.0.0" />
3939
</ItemGroup>
4040

4141
<ItemGroup>

src/Tesseract.Net48Tests/Tesseract.Net48Tests.csproj

+9-9
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@
2828
<Compile Include="..\Tesseract.Tests\Leptonica\PixATests.cs" Link="Leptonica\PixATests.cs" />
2929
<Compile Include="..\Tesseract.Tests\Leptonica\PixTests\ImageManipulationTests.cs" Link="Leptonica\PixTests\ImageManipulationTests.cs" />
3030
<Compile Include="..\Tesseract.Tests\Leptonica\PixTests\PixDataAccessTests.cs" Link="Leptonica\PixTests\PixDataAccessTests.cs" />
31-
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\FontAttributesTests.cs" Link="ResultIteratorTests\FontAttributesTests.cs" />
32-
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\OfAnEmptyPixTests.cs" Link="ResultIteratorTests\OfAnEmptyPixTests.cs" />
31+
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\FontAttributesTests.cs" Link="ResultIteratorTests\FontAttributesTests.cs" />
32+
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\OfAnEmptyPixTests.cs" Link="ResultIteratorTests\OfAnEmptyPixTests.cs" />
3333
<Compile Include="..\Tesseract.Tests\PageSerializer.cs" Link="PageSerializer.cs" />
3434
<Compile Include="..\Tesseract.Tests\ResultRendererTests.cs" Link="ResultRendererTests.cs" />
3535
<Compile Include="..\Tesseract.Tests\TesseractResultSet.cs" Link="TesseractResultSet.cs" />
@@ -40,12 +40,12 @@
4040
</ItemGroup>
4141

4242
<ItemGroup>
43-
<PackageReference Include="nunit" Version="3.12.0" />
44-
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0">
43+
<PackageReference Include="nunit" Version="3.13.3" />
44+
<PackageReference Include="NUnit3TestAdapter" Version="4.3.0">
4545
<PrivateAssets>all</PrivateAssets>
4646
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
4747
</PackageReference>
48-
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="16.8.0" />
48+
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.3.2" />
4949
</ItemGroup>
5050

5151
<ItemGroup>
@@ -210,12 +210,12 @@
210210
</ItemGroup>
211211

212212
<Target Name="SymlinkLinuxDependencies" AfterTargets="AfterBuild" Condition=" '$([System.Runtime.InteropServices.RuntimeInformation]::IsOSPlatform($([System.Runtime.InteropServices.OSPlatform]::Linux)))' ">
213-
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/liblept.so $(OutDir)x64/libleptonica-1.80.0.so"/>
214-
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/libtesseract.so.4 $(OutDir)x64/libtesseract41.so"/>
213+
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/liblept.so $(OutDir)x64/libleptonica-1.80.0.so" />
214+
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/libtesseract.so.4 $(OutDir)x64/libtesseract41.so" />
215215
</Target>
216216

217217
<Target Name="SymlinkMacOSDependencies" AfterTargets="AfterBuild" Condition=" '$([System.Runtime.InteropServices.RuntimeInformation]::IsOSPlatform($([System.Runtime.InteropServices.OSPlatform]::OSX)))' ">
218-
<Exec Command="ln -sf /usr/local/lib/liblept.dylib $(OutDir)x64/libleptonica-1.80.0.dylib"/>
219-
<Exec Command="ln -sf /usr/local/lib/libtesseract.dylib $(OutDir)x64/libtesseract41.dylib"/>
218+
<Exec Command="ln -sf /usr/local/lib/liblept.dylib $(OutDir)x64/libleptonica-1.80.0.dylib" />
219+
<Exec Command="ln -sf /usr/local/lib/libtesseract.dylib $(OutDir)x64/libtesseract41.dylib" />
220220
</Target>
221221
</Project>

src/Tesseract.NetCore31Tests/Tesseract.NetCore31Tests.csproj

+3-3
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@
3737
</ItemGroup>
3838

3939
<ItemGroup>
40-
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="16.8.0" />
41-
<PackageReference Include="nunit" Version="3.12.0" />
42-
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0">
40+
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.3.2" />
41+
<PackageReference Include="nunit" Version="3.13.3" />
42+
<PackageReference Include="NUnit3TestAdapter" Version="4.3.0">
4343
<PrivateAssets>all</PrivateAssets>
4444
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
4545
</PackageReference>

src/Tesseract.Tests/Leptonica/ColorTests.cs

+12-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ public void Color_CastColorToNetColor()
2222
Assert.That(castColor.A, Is.EqualTo(color.Alpha));
2323
}
2424
#endif
25-
25+
2626
[TestCase]
2727
public void Color_ConvertColorToNetColor()
2828
{
@@ -33,5 +33,16 @@ public void Color_ConvertColorToNetColor()
3333
Assert.That(castColor.B, Is.EqualTo(color.Blue));
3434
Assert.That(castColor.A, Is.EqualTo(color.Alpha));
3535
}
36+
37+
[TestCase]
38+
public void Color_ConvertNetColorToColor()
39+
{
40+
var color = System.Drawing.Color.FromArgb(100, 150, 200);
41+
var castColor = color.ToPixColor();
42+
Assert.That(color.R, Is.EqualTo(castColor.Red));
43+
Assert.That(color.G, Is.EqualTo(castColor.Green));
44+
Assert.That(color.B, Is.EqualTo(castColor.Blue));
45+
Assert.That(color.A, Is.EqualTo(castColor.Alpha));
46+
}
3647
}
3748
}

src/Tesseract.sln

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Microsoft Visual Studio Solution File, Format Version 12.00
2-
# Visual Studio Version 16
3-
VisualStudioVersion = 16.0.29409.12
2+
# Visual Studio Version 17
3+
VisualStudioVersion = 17.3.32929.385
44
MinimumVisualStudioVersion = 10.0.40219.1
55
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Tesseract", "Tesseract\Tesseract.csproj", "{AB8F7CF1-E75B-4BD3-8853-2348ECDEA969}"
66
EndProject

src/Tesseract/Tesseract.csproj

+8-12
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<Project Sdk="Microsoft.NET.Sdk">
22
<PropertyGroup>
3-
<TargetFrameworks>netstandard2.0;net40;net45;net48</TargetFrameworks>
3+
<TargetFrameworks>netstandard2.0;net47;net48</TargetFrameworks>
44
</PropertyGroup>
55
<PropertyGroup>
66
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
@@ -9,25 +9,21 @@
99
<Authors>Charles Weld</Authors>
1010
<Company />
1111
<Product>Tesseract</Product>
12-
<Description>Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.</Description>
12+
<Description>Tesseract 5 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.</Description>
1313
<Copyright>Copyright 2012-2020 Charles Weld</Copyright>
1414
<PackageReleaseNotes>https://github.com/charlesw/tesseract/blob/master/ChangeLog.md</PackageReleaseNotes>
1515
<PackageProjectUrl>https://github.com/charlesw/tesseract/</PackageProjectUrl>
1616
<RepositoryUrl>https://github.com/charlesw/tesseract/</RepositoryUrl>
1717
<PackageTags>Tesseract Ocr</PackageTags>
18-
<Version>4.1.1</Version>
19-
<AssemblyVersion>4.1.1.0</AssemblyVersion>
18+
<Version>5.2.0</Version>
19+
<AssemblyVersion>5.2.0</AssemblyVersion>
2020
<NeutralLanguage></NeutralLanguage>
2121
<PackageLicenseExpression>Apache-2.0</PackageLicenseExpression>
2222
</PropertyGroup>
23-
<!-- .NET 4.0 references, compilation flags and build options -->
24-
<PropertyGroup Condition=" '$(TargetFramework)' == 'net40'">
25-
<DefineConstants>NET40;NETFULL;SYSTEM_DRAWING_SUPPORT</DefineConstants>
26-
</PropertyGroup>
2723

28-
<!-- .NET 4.5 references, compilation flags and build options -->
29-
<PropertyGroup Condition=" '$(TargetFramework)' == 'net45'">
30-
<DefineConstants>NET45;NETFULL;SYSTEM_DRAWING_SUPPORT</DefineConstants>
24+
<!-- .NET 4.7 references, compilation flags and build options -->
25+
<PropertyGroup Condition=" '$(TargetFramework)' == 'net47'">
26+
<DefineConstants>NET47;NETFULL;SYSTEM_DRAWING_SUPPORT</DefineConstants>
3127
</PropertyGroup>
3228

3329
<!-- .NET 4.8 references, compilation flags and build options -->
@@ -71,4 +67,4 @@
7167
<CopyToOutputDirectory>Never</CopyToOutputDirectory>
7268
</None>
7369
</ItemGroup>
74-
</Project>
70+
</Project>
5.5 KB
Binary file not shown.

src/Tesseract/x64/tesseract.exe

2.1 MB
Binary file not shown.
4.5 KB
Binary file not shown.

src/Tesseract/x86/tesseract.exe

1.75 MB
Binary file not shown.

0 commit comments

Comments
 (0)