Posts tagged sybaseblog
Petabyte Size Data Store Managed by Hadoop & Map Reduce.
0Hadoop
——–
Source : http://hadoop.apache.org/ & www.
Today, we’re surrounded by data. People upload videos, take pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth. Machines, too, are generating and keeping more and more data. You may even be reading this book as digital data on your computer screen, and certainly your purchase of this book is recorded as data with some retailer.
The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft. They needed to go through terabytes and petabytes of data to figure out which websites were popular, what books were in demand, and what kinds of ads appealed to people. Existing tools were becoming inadequate to process such large data sets. Google was the first to publicize MapReduce—a system they had used to scale their data processing needs.
This system aroused a lot of interest because many other businesses were facing similar scaling challenges, and it wasn’t feasible for everyone to reinvent their own proprietary tool. Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop . Soon after, Yahoo and others rallied around to support this effort.
What is Hadoop ?
————–
Hadoop is an open source framework for writing and running distributed applications that process large amounts of data. Distributed computing is a wide and varied field, but the key distinctions of Hadoop are that it is
1.Accessible—Hadoop runs on large clusters of commodity machines or on cloud computing services such as Amazon’s Elastic Compute Cloud (EC2 ).
2.Robust—Because it is intended to run on commodity hardware, Hadoop is architected with the assumption of frequent hardware malfunctions. It can gracefully handle most such failures.
3.Scalable—Hadoop scales linearly to handle larger data by adding more nodes to the cluster.
4.Simple—Hadoop allows users to quickly write efficient parallel code.
Comparing SQL databases and Hadoop:
————————————
Hadoop is a framework for processing data, what makes it better than standard relational databases, the workhorse of data processing in most of today’s applications? One reason is that SQL (structured query language) is by design targeted at structured data. Many of Hadoop’s initial applications deal with unstructured data such as text. From this perspective Hadoop provides a more general paradigm than SQL.
For working only with structured data, the comparison is more nuanced. In principle, SQL and Hadoop can be complementary, as SQL is a query language which can be implemented on top of Hadoop as the execution engine.3 But in practice, SQL databases tend to refer to a whole set of legacy technologies, with several dominant vendors, optimized for a historical set of applications. Many of these existing commercial databases are a mismatch to the requirements that Hadoop targets.
Some Implementation of Hadoop for production purpose :
——————————————————
Complete List @ http://wiki.apache.org/hadoop/PoweredBy
Sybase IQ
———
Sybase IQ : http://www.computerworld.com/s/article/9221355/Updated_Sybase_IQ_supports_Hadoop_MapReduce_Big_Data_
EBay
—-
532 nodes cluster (8 * 532 cores, 5.3PB).
Heavy usage of Java MapReduce, Pig, Hive, HBase
Using it for Search optimization and Research.
Facebook
——-
We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.
Currently we have 2 major clusters:
A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
A 300-machine cluster with 2400 cores and about 3 PB raw storage.
Each (commodity) node has 8 cores and 12 TB of storage.
We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS.
LinkedIn
———
We have multiple grids divided up based upon purpose. * Hardware:
120 Nehalem-based Sun x4275, with 2×4 cores, 24GB RAM, 8x1TB SATA
580 Westmere-based HP SL 170x, with 2×4 cores, 24GB RAM, 6x2TB SATA
1200 Westmere-based SuperMicro X8DTT-H, with 2×6 cores, 24GB RAM, 6x2TB SATA
Software:
CentOS 5.5 -> RHEL 6.1
Sun JDK 1.6.0_14 -> Sun JDK 1.6.0_20 -> Sun JDK 1.6.0_26
Apache Hadoop 0.20.2+patches -> Apache Hadoop 0.20.204+patches
Pig 0.9 heavily customized
Azkaban for scheduling
Hive, Avro, Kafka, and other bits and pieces…
Twitter
——–
We use Hadoop to store and process tweets, log files, and many other types of data generated across Twitter. We use Cloudera’s CDH2 distribution of Hadoop, and store all data as compressed LZO files.
We use both Scala and Java to access Hadoop’s MapReduce APIs
We use Pig heavily for both scheduled and ad-hoc jobs, due to its ability to accomplish a lot with few statements.
We employ committers on Pig, Avro, Hive, and Cassandra, and contribute much of our internal Hadoop work to opensource (see hadoop-lzo)
For more on our use of Hadoop, see the following presentations: Hadoop and Pig at Twitter and Protocol Buffers and Hadoop at Twitter
Yahoo!
——–
More than 100,000 CPUs in >40,000 computers running Hadoop
Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
Used to support research for Ad Systems and Web Search
Also used to do scaling tests to support development of Hadoop on larger clusters
Our Blog – Learn more about how we use Hadoop.
>60% of Hadoop Jobs within Yahoo are Pig jobs.
Data_
Implementation of Function String in Sybase Replication Server(SRS)
0These experience shared by Senior DBAs as name mentioned, Hope this will help you to understand more about function string from implementation point of view in a Replication environment:
Craig Oakley , Senior DBA.
—————————--
We used function strings when we wanted to replicate all columns to some servers, and only selected columns to other (web-facing) servers. This was particularly useful before Rep Server allowed multiple RepDefs on the same table. One concern was text columns which were not being replicated to the web-facing server: we had to create a function string to get a text pointer (we used a one-row table and just update all the text columns on top of each other, as the value was not needed on that server): failure to get a text pointer cause the DSI to go down, and we could not specify that as a condition to ignore.
Beyond this, I would imagine function strings could help specify how you want the update to be done, which could be a performance improvement. It would also allow for a different implementation at the replicate than there is at the primary (such as a table at the primary being two joined tables at the replicate).
Sukhesh Nair, Senior Sybase DBA
———————————–
We used to have a setup where data was replicated from sybase to oracle as also to a warm standby sybase server. Rep Server function strings helped in filtering data that would need to be passed to Oracle. It helped immensely in streamlining the data flow to targets by manipulating the incoming data through function string. I feel it is one of the most advanced and useful yet very less used capabilities of Sybase Rep Server.
The deterrent could be because of the complexity it would introduce to the replication system. The setup we had worked wonderfully and never gave us any major problems. Without proper monitoring (which needs to be scripted by DBAs) it used to be hard to maintain. Many of the current Rep Server administrators I see do not have adequate knowledge or experience of handling function strings.
Rey Wang , Senior Sybase DBA
————————-
You can map the delete to no op with functional string.
Partha Gogoi Senior DBA
————————-
We use function strings to transform data at the replicate..We have databases being replicated from Toronto and New York to London, Sydney and Singapore and the client ids are transformed at the replicate because, as per business requirements, the client ids are different at each site.. Of course , having a Universal client id would simplify things , but the systems and databases at each site grew independently until replication was set up and it would be a lot of rework to change all the client ids at the replicate sites
Øystein Grinaker Senior DBA
—————————
A Function String could be used to change default behaviour.
Say you delete a row in a table on PDB, but you do not want to delete the row on the RDB. Then make a change in rs_delete. You may make the rs delete just to make a logical delete by updateing a deletemarker for that spesific row.
Source : Linkedin.com
What’s in YOUR Architecture?
0What’s in your architecture?
Probably not enough if you’re using just data modeling to integrate all the components of your enterprise, especially your information. To lay the
foundation of this paper, let’s start from the same level of understanding: the need for a common approach to managing all the aspects of information to
enable rapid business performance in the 21st century economy.
Competition in the marketplace is always fierce. To stay abreast, organizations must constantly analyze their customer needs and expectations, enhance
or innovate their business processes, and deliver products and services that create exceptional customer value. Organizations also need to be customercentric to forge long-term relationships with clients and consumers. Only organizations that are agile enough to respond to volatile market conditions
with innovation, expedited time-to-market processes, and reduced costs can differentiate themselves from the competition. Such agility occurs when a
company’s IT operations are closely aligned with its business operations. IT needs to understand business to implement technologies and applications that
support the current and future business goals.
Read Full Article @ http://www.sybase.com/files/White_Papers/Sybase_Whats_in_Your_Architecture_WP.pdf
K21– ASE’S KERNEL DESIGN FOR THE 21ST CENTURY – ASE 15.7’s THREADED KERNEL K21
0Basic Difference :
Process Kernel:
Pre-15.7 kernel (except Windows)
Each engine is a separate process
Retained in 15.7 for risk mitigation
Threaded Kernel:
Default kernel for 15.7
Each engine is a thread of a single process
Additional threads for handling I/O, etc.
ASE on Windows has always been thread based
http://www.sybase.com/files/Product_Overviews/ASE-15.7-New-Threaded-Kernel.pdf
Survey among Sybase and Oracle customers – Bloor’s Research .
0
1. License fees: 75% of respondents thought that Sybase ASE was less expensive, by an average of 28%.
2. Support costs: 78% of respondents thought that Sybase ASE was less expensive, by an average of 32%.
3. Number of database administrators: nobody thought that Oracle required fewer DBAs and 61% thought that Sybase required fewer DBAs. On average the saving was 32%.
4. Frequency of security patches: again, nobody thought that this was less frequent in the case of Oracle while 68% thought that this was the case with Sybase, with an average reduction of 22%.
5. Issue resolution: 73% thought that Sybase was faster at resolving issues, typically being 21% faster.
Reference: http://www.sybase.com/files/White_Papers/SYBASE_ASE_Bloor_Research_TCO_vs_Oracle.pdf
dataserver -X : ASE Diagnostic Interface
0
$dataserver -X <——- For running the dataserver in sybmon mode.(starts this server as sybmon, not dataserver)
Enter password: <——- Password quine
Adaptive Server Enterprise/12.5.3/EBF 12331 ESD#1/P/Sun_svr4/OS 5.8/ase1253/1900/64-bit/FBO/Tue Jan 25 08:52:58 2005
Sybase Adaptive Server Enterprise Diagnostic Interface
Confidential property of Sybase, Inc.
Copyright 1987, 2005
Sybase, Inc. All rights reserved.
Unpublished rights reserved under U.S. copyright laws.
This software contains confidential and trade secret information of Sybase,
Inc. Use, duplication or disclosure of the software and documentation by
the U.S. Government is subject to restrictions set forth in a license
agreement between the Government and Sybase, Inc. or other written
agreement specifying the Government’s rights to use the software and any
applicable FAR provisions, for example, FAR 52.227-19.
Sybase, Inc. One Sybase Drive, Dublin, CA 94568, USA
WARNING: For use by authorized personnel only.
If you are not an employee of Sybase, Inc., or
have not been authorized by a qualified employee
of Sybase, Inc., please terminate this program now.
No servers found using directory: /data/sybase/sqlserver/12.5.3
> cat /data/sybase/sqlserver/12.5.3/ASE-12_5/ <————— We need to provide the Krg file location with cat
Shared memory regions currently cataloged:
Name Key Id Status
—————————————————————–
PROD_ASE1 0x64d28ab5 5603 Available
PROD_ASE2 0x64d28adf 5604 Available
PROD_ASE3 0x64d28add 205 Available
PROD_ASE4 0x64d28ae5 206 Available
> attach PROD_ASE3 <—————-Attaching a shared memory segment for analysis
Attaching to server PROD_ASE3, using shared memory id: 205
PROD_ASE3:active> help ?
Help text for Sybmon commands
Usage: <help | ?> [<command group name> | all]
PROD_ASE3:active> detach <————-Detaching the shared memory segment
> quit <——–Exiting from sybmon mode
you have mail in /var/mail//sybase
PROD_ASE3:active> who ?
List all active server processes, process for specified spid,
or only busy, idle or blocked processes
Usage: who [ <spid> | busy | blocked | idle ]
PROD_ASE3:active> locks ?
Display all the locks held or waited for
Usage: locks
PROD_ASE3:active> traceflags ?
List all active traceflags
Usage: traceflags [( 1 | 2 )]
PROD_ASE3:active> opentables ?
Display open tables for one or all active database processes
Usage: opentables [<spid> | <kpid> | <SYB_PROC *>]
PROD_ASE3:active> memdump ?
Dump server’s shared memory region(s) to a disk file
Usage: memdump [<file name> [[nocache | cache] [halt| nohalt] [proc | noproc] [nounused | unused]] | [full]]
The first of each argument pair is the default.
PROD_ASE3:active> stacktrace ?
Display stack trace for a server process
Usage: stack <kpid> | <spid> | <syb_proc addr in hex> | all | run
PROD_ASE3:active> status ?
Show status of shared memory and sybmon program
Usage: status
PROD_ASE3:active> status
Attached to server: PROD_ASE3
Logging: off
Display: on
Timestamplog: on
Sybmon Diagnostics:
General Diagnostics: off
Print Module Diagnostics: off
Virtual Memory Manager Diagnostics: off
Virtual Machine Diagnostics: off
Dump file mapping mode: normal
PROD_ASE3:active> version ?
Display the version of this program
sybaseblog.com completed 2 yrs on Oct 31st!
0All,
Last Monday Oct 31st 2011, sybaseblog.com has completed 2 years!
Thanks for all your support and wishes!
Happy Sybase Learning !
Cheers…
Sybase dataserver binary output and dbcc command.
0When we execute dbcc sqltext without putting on traceflag 3604 and 3605
where the out put of sqltext goes? In errorlog?
No,For errorlog we have traceflag 3605.
Lets explore the RUN server file again:
/opt/sybase/ASE-15_0/bin/dataserver \
-d/opt/sybase/devices/master.dat \
-e/opt/sybase/ASE-15_0/install/PROD_ASE_DS1.log \
-c/opt/sybase/ASE-15_0/PROD_ASE_DS1.cfg \
-M/opt/sybase/ASE-15_0 \
-sPROD_ASE_DS1 > /dev/null \
-e : denoting the errorlog file where all error message and informational messages resides.
AS we know, When we are running any binary file, the output of that binary displays on the screen.
What about the output of $SYBASE/$YSBASE_ASE/bin/dataserver binary,
generally we redirect it to null device (/dev/null) as above.
Now,I am redirecting the output to file like below as in /tmp/sybaselog.out file.
/opt/sybase/ASE-15_0/bin/dataserver \
-d/opt/sybase/devices/master.dat \
-e/opt/sybase/ASE-15_0/install/PROD_ASE_DS1.log \
-c/opt/sybase/ASE-15_0/PROD_ASE_DS1.cfg \
-M/opt/sybase/ASE-15_0 \
-sPROD_ASE_DS1 > /tmp/sybaselog.out \
Run the dbcc sqltext command , the result would be display in dataserver output file, without any traceflag.
It means when we require any output on user screen and errorlog, need to enable the traceflag 3604 and 3605 respectively,
otherwise it will be display in sybase dataserver binary , out put file ,if we are redirecting it to file.
sybase@localhost ~]$ isql -Usa -SPROD_ASE_DS1
Password:
1> select @@spid
2> go
------
14
(1 row affected)
1> select name from sysdatabases
2> go
name
------------------------------------------------------------
master
model
sybsecurity
sybsystemdb
sybsystemprocs
tempdb
(6 rows affected)
1> dbcc sqltext(14)
2> go
DBCC execution completed. If DBCC printed error messages, contact a user with
System Administrator (SA) role.
1> dbcc sqltext(14)
2> go
DBCC execution completed. If DBCC printed error messages, contact a user with
System Administrator (SA) role.
1>
[sybase@localhost ~]$ tail -f /tmp/sybaselog.out
00:00:00000:00001:2011/09/28 08:51:15.11 server ASE's default unicode sort order is 'binary'.
00:00:00000:00001:2011/09/28 08:51:15.11 server ASE's default sort order is:
00:00:00000:00001:2011/09/28 08:51:15.11 server 'bin_iso_1' (ID = 50)
00:00:00000:00001:2011/09/28 08:51:15.11 server on top of default character set:
00:00:00000:00001:2011/09/28 08:51:15.11 server 'iso_1' (ID = 1).
00:00:00000:00001:2011/09/28 08:51:15.11 server Master device size: 500 megabytes, or 256000 virtual pages. (A virtual page is 2048 bytes.)
00:00:00000:00001:2011/09/28 08:51:15.11 kernel Warning: Cannot set console to nonblocking mode, switching to blocking mode.
SQL Text: SELECT fid=right(space(80)+isnull(convert(varchar(80),fid),'NULL'),3), spid=right(space(80)+isnull(convert(varchar(80),spid),'NULL'),4), status=SUBSTRING(convert(varchar(80),status),1,10), loginame=SUBSTRING(convert(varchar(80),loginame),1,8), origname=SUBSTRING(convert(varchar(80),origname),1,8), hostname=SUBSTRING(convert(varchar(80),hostname),1,21), blk_spid=right(space(80)+isnull(convert(varchar(80),blk_spid),'NULL'),8), dbname=SUBSTRING(convert(varchar(80),dbname),1,6), tempdbname=SUBSTRIN
SQL Text: select @@spid
SQL Text: select name from sysdatabases
Please let me knwo if you have any more thoughts!!
Compiled Objects in ASE
0Adaptive Server uses compiled objects to contain vital information about each database and to help you access and manipulate data.
- A compiled object is any object that requires entries in the sysprocedures table, including:
- Check constraints
- Defaults
- Rules
- Stored procedures
- Extended stored procedures
- Triggers
- Views
- Functions
- Computed columns
- Partition conditions
Compiled objects are created from source text, which are SQL statements that describe and define the compiled object.
When a compiled object is created, Adaptive Server:
- Parses the source text, catching any syntactic errors, to generate a parsed tree.
- Normalizes the parsed tree to create a normalized tree, which represents the user statements in a binary tree format. This is the compiled object.
- Stores the compiled object in the sysprocedures table.
- Stores the source text in the syscomments table.
object id 98 and 99 in ASE:sysencryptkeys & ALLOCATION
0Yday came across two object ids 98 & 99, after running the object_name function with these object id , I found two objects ALLOCATION and sysencrptkeys, but no details in sysobjects. I found these tables in all databases.
On Investigation I found that :
sysencryptkeys (object id 98) is a system table that has a role when ecrypted columns are used. More details: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc36274.1502/html/tables/tables32.htm http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00968.1502/html/Encryption/title.htm An ALLOCATION page (object id 99) is not a table, rather a space housekeeping page in ASE that manages page allocation. Space management in ASE is organised around allocation and OAM pages. Each device is divided up in multiples of 256 pages, meaning that every 256 pages, you will find an allocation page. This page keeps track which extents (8 pages) are allocated to an object and if they are, which object. More info: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00841.1502/html/phys_tune/X37428.htm http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00841.1502/html/phys_tune/phys_tune44.htm
98 sysencryptkeys 99 ALLOCATION
Recent Comments