Skip to content

Enhancement: How to use LOBs like streams (in a file-like way)? #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hvbtup opened this issue Dec 21, 2017 · 6 comments
Open

Enhancement: How to use LOBs like streams (in a file-like way)? #128

hvbtup opened this issue Dec 21, 2017 · 6 comments

Comments

@hvbtup
Copy link

hvbtup commented Dec 21, 2017

This is not a cx_Oracle error, but an enhancement request.

I would like to use a CLOB in a way compatible to io.TextIOBase,
such that I can the usual read() and seek() methods.
This seems a very natural approach to me.

However, I can't get it to work.
The problem is specifying the encoding (for writing the content to a binary file later).
The TextIOBase class provides an attribute "encoding", but I cannot set it.

A very similar approach for BLOBs and io.RawIOBase does work fine
(of course, there is no encoding attribute then).

This is what I tried:

class UnbufferedClobReader(io.TextIOBase):
    """
    A file-like wrapper for a read-only cx_Oracle CLOB object.
    """
    def __init__(self, clobLocator, encoding):
        self.clobLocator = clobLocator
        self.offset = 0
        self.encoding = encoding
        self.size = clobLocator.size()

    def seekable(self):
        return True
    
    def seek(self, offset, whence):
        if whence == 0:
            self.offset = offset
        elif whence == 1:
            self.offset += offset
            if self.offset < 0:
                self.offset = 0
        elif whence == 2:
            if offset <= 0 and -offset <= self.size:
                self.offset = self.size + offset
            else:
                raise IOError(96, "Invalid offset for CBlobReader")
        else:
            self._unsupported("seek")
        return self.offset

    def readable(self):
        return True
    
    def read(self, size):
        size = min(self.size - self.offset, size)
        if size == 0:
            return ""
        chunk = self.clobLocator.read(self.offset + 1, size)
        assert len(chunk) == size
        self.offset += size
        return chunk

Note: Actually I only need the read method in my code.

The constructor fails with the exception
AttributeError: attribute 'encoding' of '_io._TextIOBase' objects is not writable

Any ideas how specifying the encoding (for when the stream has to be written to a byte stream) can be accomplished?

It would be fine if cx_Oracle provided such wrapper-classes directly.

  1. What is your version of Python? Is it 32-bit or 64-bit?
    3.6.4 win32-AMD64

  2. What is your version of cx_Oracle?
    6.0.3

  3. What is your version of the Oracle client (e.g. Instant Client)? How was it
    installed? Where is it installed?
    12.2.0.1.0 Instant Client (working fine)

  4. What is your version of the Oracle Database?
    12.1

  5. What is your OS and version?
    Windows

  6. What compiler version did you use? For example, with GCC, run
    n/a

  7. What environment variables did you set? How exactly did you set them?
    n/a

  8. What exact command caused the problem (e.g. what command did you try to
    install with)? Who were you logged in as?
    n/a

  9. What error(s) you are seeing?
    AttributeError: attribute 'encoding' of '_io._TextIOBase' objects is not writable

@anthony-tuininga
Copy link
Member

To answer your first question: you can get the encoding from the connection. There is an attribute on connections called "encoding" which you would use for CLOBs and "nencoding" which you would use for NCLOBs. Currently the connection is not directly available from the LOB, but since internally the reference is there, that could be exposed. I'll consider adding support directly -- but it would help if you could provide code "before" and "after" this enhancement is implemented, that demonstrates why it would be advantageous! Thanks for the suggestion in any case!

@hvbtup
Copy link
Author

hvbtup commented Dec 22, 2017

Hmm, it's a bit more complicated in my case.

The encoding is the encoding intended for writing the CLOB to a file; this is different from the connection encoding (for example, the connection uses AL32UTF8 aka "utf-8" in Python, but the text should be serialized using "windows-1252".

In my concrete case I could work around the AttributeError by simply not using io.TextIOBase as a base class.

My use case is that the LOB is just one of several different ways for reading and writing large text or binary data. That is, I consider a DB LOB just as a special case of a text or byte stream.

With my class, I can then use the LOB just as I would use any other stream.

For example, with an UnbufferedClobReader instance, I can use a construct like this:

def write_text_file(stream, filename):
    "Write the text stream to a file, works with io.TextIOBase(-like) streams"
    open(filename, "wt", encoding=stream.encoding).write(filename)

clobvar = curs.var(cx_Oracle.CLOB)
clobvar.setvalue(0, "")
curs.execute("BEGIN :po_clob := Create_a_Clob_Somehow%(...,); END;,
    ..., # imageine some input here
    po_clob = clobvar
    )
text_reader = UnbufferedClobReader(clobvar.getvalue(), encoding)
write_text_file(text_reader, "output.txt")

BTW doesn't JDBC offer a streaming API for LOBs, too?
https://docs.oracle.com/javase/7/docs/api/java/sql/Blob.html

I'd like to see something similar for cx_Oracle, and that would probably be derive from the base classes in the io module.

With a more sophisticated UnbufferedClobReader class (implementing some more of the methods definied in the io module), I could also do

for line in text_reader:
    do_some_processing(line)

Working with LOBs would feel more natural that way, IMHO.

@doerwalter
Copy link

Can't you use something like this (which is taken from https://github.com/LivingLogic/LivingLogic.Python.xist/blob/master/src/ll/orasql/__init__.py):

class LOBStream:
	"""
	A :class:`LOBStream` object provides streamlike access to a ``BLOB`` or ``CLOB``.
	"""

	def __init__(self, value):
		self.value = value
		self.pos = 0

	def readall(self):
		"""
		Read all remaining data from the stream and return it.
		"""
		result = self.value.read(self.pos+1)
		self.pos = self.value.size()
		return result

	def readchunk(self):
		"""
		Read a chunk of data from the stream and return it. Reading is done in
		optimally sized chunks.
		"""
		size = self.value.getchunksize()
		bytes = self.value.read(self.pos+1, size)
		self.pos += size
		if self.pos >= self.value.size():
			self.pos = self.value.size()
		return bytes

	def read(self, size=None):
		"""
		Read :obj:`size` bytes/characters from the stream and return them.
		If :obj:`size` is :const:`None` all remaining data will be read.
		"""
		if size is None:
			return self.readall()
		if size <= 0:
			return self.readchunk()
		data = self.value.read(self.pos+1, size)
		self.pos += size
		if self.pos >= self.value.size():
			self.pos = self.value.size()
		return data

	def reset(self):
		"""
		Reset the stream so that the next :meth:`read` call starts at the
		beginning of the LOB.
		"""
		self.pos = 0

	def seek(self, offset, whence=0):
		"""
		Seek to the position :obj:`offset` in the LOB. The :obj:`whence` argument
		is optional and defaults to ``0`` (absolute file positioning);
		The other allowed value is ``1`` (seek relative to the current position).
		"""
		if whence == 0:
			self.pos = whence
		elif whence == 1:
			self.pos += whence
		else:
			raise ValueError(f"unkown whence: {whence!r}")
		size = self.value.size()
		if self.pos >= size:
			self.pos = size
		elif self.pos < 0:
			self.pos = 0

But I agree that it would be great if cx_Oracle's BLOB/CLOB supported Python stream interface naturally.

@hvbtup
Copy link
Author

hvbtup commented Apr 3, 2018

Of course I can use something similar (in fact, I do). The point is that this is like reinventing the wheel if every developer has to do this him/herself.

@Elias481
Copy link

Some comments on this..

If You want to set the encoding attribute of Your TextIOBase subclass You could do so:

class UnbufferedClobReader(io.TextIOBase):`
    encoding = None
    def __init__(self, clobLocator, encoding):
        self.clobLocator = clobLocator
        self.offset = 0
        self.encoding = encoding
        self.size = clobLocator.size()
...

But this is not what You are looking for:
It indicates how the class is converting the unicode strings that You can read/write with it when reading/writing them to the underlying stream (the CLOB object in this case).
So You could/should use it to store the encoding from connection there and implement proper conversion within Your implemented functions for reading/writing the data from/to lob (using unicode/codec functions) using this function. (But note: But this conversion is only needed for Python 2 and never for NCLOB as in Python 3 and for NCLOB the lob.read already returns unicode string object so no conversion to/from unicode needed then.)
So if You are using only Python 3 You would not need this attribute at all - and if You are using Python 2 You need to implement conversion to unicode before returning the value from read (otherwise it would not be a valid/compatible TextIOBase object). Example:

    def read(self, size):
        size = min(self.size - self.offset, size)
        if size == 0:
            return u""
        chunk = self.clobLocator.read(self.offset + 1, size)
        if self.encoding:
            chunk = chunk.decode(self.encoding)
        assert len(chunk) == size
        self.offset += size
        return chunk

The conversion to whatever charset You want is to be implemented separately. So You can open Your target file that need certain encodeing as for example io.TextIOWrapper object that does the encoding from unicode to "windows-1252" if You want to write it to a file and just write the unicode You get from Your reader.
Otherwise use the appropriate codecs function/classes for the conversion or the unicode strings encode method.

But depending on Your overall needs this ist not the most efficient way to implement that.

Anyway it would be convenient if the LOB Objects would provide a method to return such an RawIOBase/TextIOBase object of appropriate type. Even better if it would allow to choose between Raw and Text for the CLOB types to avoid overhead of intermediate Unicode conversion. And maybe for corner-cases also an TextIOBase object for BLOB (where the user can specify the character set for encoding) if it would be implemented anyway.

@anthony-tuininga
Copy link
Member

Thanks for these comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants