Reading text from binary data in a blob [message #683871] |
Sat, 27 February 2021 10:31 |
|
appender
Messages: 1 Registered: February 2021
|
Junior Member |
|
|
We are storing all files as blobs. There is a need to take the text content from several .doc files stored in this way, and insert all of this text into another blob. The problem is getting the text out of the blob. The following code works for .txt files, but not for .doc files (it only returns gibberish for .doc files). I assume that that is because the encoding of .doc files shows up as binary, whereas for .txt files it is UTF8 etc. I've tried all kinds of conversions but haven't been able to find a solution.
Is it perhaps possible to avoid this conversion to a clob altogether, and only work with blobs? I've tried appending blobs to one another, but probably due to some other data in the blob this doesn't work, and only the content of the first appended file is visible in the final result.
declare
v_file_ids numberlist := numberlist();
v_blob blob;
v_total clob;
v_clob clob;
v_nvarchar nvarchar2(32767);
v_start pls_integer;
v_buffer pls_integer;
begin
dbms_lob.createtemporary(v_total, true);
v_file_ids.extend();
v_file_ids(v_file_ids.last) := --random ID;
for i in 1..v_file_ids.count loop
dbms_lob.createtemporary(v_clob, true);
v_start := 1;
v_buffer := 32767;
--gets blob of the file
GetFileContent(v_file_ids(i), v_blob);
--retrieves the text content from the file
for i in 1..ceil(dbms_lob.getlength(v_blob)/v_buffer)
loop
v_nvarchar := utl_raw.cast_to_nvarchar2(dbms_lob.substr(v_blob, v_buffer, v_start));
dbms_lob.writeappend(v_clob, length(v_nvarchar), v_nvarchar);
v_start := v_start + v_buffer;
end loop;
--appends the retrieved text to the main clob
dbms_lob.append(v_total, v_clob);
end loop;
end;
|
|
|
Re: Reading text from binary data in a blob [message #683872 is a reply to message #683871] |
Sat, 27 February 2021 11:17 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Welcome to the forum.
Please read the OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.
Also always post your Oracle version, with 4 decimals (query v$version), as often solution depends on it.
Quote:The following code works for .txt files, but not for .doc files (it only returns gibberish for .doc files
So it seems that your procedure GetFileContent reads files as text and not binary.
Anyway, Oracle already does everything for you: have a look at DBMS_LOB.LOADBLOBFROMFILE procedure.
Here's a simple example:
DECLARE
tmp_lob BLOB;
file BFILE;
src_off PLS_INTEGER := 1;
dst_off PLS_INTEGER := 1;
BEGIN
DBMS_LOB.CREATETEMPORARY (tmp_lob, false, dur=>dbms_lob.call);
file := BFILENAME('<FILEDIR>','<FILENAME>');
DBMS_LOB.OPEN (file);
DBMS_LOB.LOADBLOBFROMFILE (tmp_lob, file, dbms_lob.lobmaxsize, src_off, dst_off);
-- <... do what you want with the lob ...> --
END;
/
[Updated on: Sat, 27 February 2021 11:18] Report message to a moderator
|
|
|
|